spark-bigquery (homepage)

Google BigQuery data source for Apache Spark

spark-bigquery provides a Google BigQuery data source to Apache Spark using the new Google Cloud client libraries for the Google BigQuery API. It supports "direct" import/export where records are directly streamed from/to BigQuery. In addition, data may be imported/exported via intermediate data extracts on Google Cloud Storage (GCS).


Tags

  • 1|BigQuery
  • 1|google-cloud

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages miraisolutions:spark-bigquery:0.1.0-s_2.11

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "miraisolutions/spark-bigquery:0.1.0-s_2.11"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "miraisolutions" % "spark-bigquery" % "0.1.0-s_2.11"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>miraisolutions</groupId>
    <artifactId>spark-bigquery</artifactId>
    <version>0.1.0-s_2.11</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Releases

Version: 0.1.0-s_2.11 ( e19e1a | zip | jar ) / Date: 2018-07-31 / License: MIT