sparksql-protobuf (homepage)

Read SparkSQL parquet file as RDD[Protobuf]

@saurfang / (0)

This library provides utilities to work with Protobuf objects in SparkSQL.
It provides a way to read parquet file written by SparkSQL back as an RDD of compatible protobuf object.
It can also converts RDD of protobuf objects into DataFrame.


Tags

  • 1|sql
  • 1|data source
  • 1|protobuf

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages saurfang:sparksql-protobuf:0.1.2-s_2.10

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "saurfang/sparksql-protobuf:0.1.2-s_2.10"

Otherwise,

resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/"

libraryDependencies += "saurfang" % "sparksql-protobuf" % "0.1.2-s_2.10"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>saurfang</groupId>
    <artifactId>sparksql-protobuf</artifactId>
    <version>0.1.2-s_2.10</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>https://repos.spark-packages.org/</url>
  </repository>
</repositories>

Releases

Version: 0.1.2-s_2.10 ( 765e28 | zip | jar ) / Date: 2015-08-18 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: - 22% , - 100% , - 65% , - 97% , - 71%