sparksql-protobuf (homepage)
Read SparkSQL parquet file as RDD[Protobuf]
@saurfang / (0)
This library provides utilities to work with Protobuf objects in SparkSQL.
It provides a way to read parquet file written by SparkSQL back as an RDD of compatible protobuf object.
It can also converts RDD of protobuf objects into DataFrame.
Tags
How to
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages saurfang:sparksql-protobuf:0.1.2-s_2.10
sbt
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "saurfang/sparksql-protobuf:0.1.2-s_2.10"
Otherwise,
resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/" libraryDependencies += "saurfang" % "sparksql-protobuf" % "0.1.2-s_2.10"
Maven
In your pom.xml, add:<dependencies> <!-- list of dependencies --> <dependency> <groupId>saurfang</groupId> <artifactId>sparksql-protobuf</artifactId> <version>0.1.2-s_2.10</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>https://repos.spark-packages.org/</url> </repository> </repositories>
Releases
Version: 0.1.2-s_2.10 ( 765e28 | zip | jar ) / Date: 2015-08-18 / License: Apache-2.0 / Scala version: 2.10