sparksql-protobuf (homepage)

Read SparkSQL parquet file as RDD[Protobuf]

@saurfang / (0)

This library provides utilities to work with Protobuf objects in SparkSQL.
It provides a way to read parquet file written by SparkSQL back as an RDD of compatible protobuf object.
It can also converts RDD of protobuf objects into DataFrame.


Tags

  • 1|sql
  • 1|data source
  • 1|protobuf

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages saurfang:sparksql-protobuf:0.1.2-s_2.10

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "saurfang/sparksql-protobuf:0.1.2-s_2.10"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "saurfang" % "sparksql-protobuf" % "0.1.2-s_2.10"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>saurfang</groupId>
    <artifactId>sparksql-protobuf</artifactId>
    <version>0.1.2-s_2.10</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Releases

Version: 0.1.2-s_2.10 ( 765e28 | zip | jar ) / Date: 2015-08-18 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: 1.0.0 - 22% , 1.1.0 - 65% , 1.2.0 - 71% , 1.3.0 - 100% , 1.4.0 - 97%