Spark-SQL-on-HBase (homepage)
Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
@Huawei-Spark / (1)
This technology provides with scalable and reliable Spark SQL/DataFrame access
to NOSQL data in HBase, through HBase's "native" data access APIs. HBase
pushdown capabilities, in forms of projection pruning, coprocessor and custom
filtering, are optimally utilized to support ultra low latency processing. A
novel technique based upon partial evaluation is introduced to process
virtually arbitrarily complex logic predicates to precisely prune partitions,
generate partition-specific predicates, and enable intelligent jumps in scans
over multidimensional data sets. Overall the system is designed for ad hoc, interactive queries.
The current version of 0.1.0 runs on Apache Spark 1.4.0 release.
Tags
How to
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages Huawei-Spark:Spark-SQL-on-HBase:1.0.0
sbt
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "Huawei-Spark/Spark-SQL-on-HBase:1.0.0"
Otherwise,
resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/" libraryDependencies += "Huawei-Spark" % "Spark-SQL-on-HBase" % "1.0.0"
Maven
In your pom.xml, add:<dependencies> <!-- list of dependencies --> <dependency> <groupId>Huawei-Spark</groupId> <artifactId>Spark-SQL-on-HBase</artifactId> <version>1.0.0</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>https://repos.spark-packages.org/</url> </repository> </repositories>
Releases
Version: 1.0.0 ( 380fdf | zip | jar ) / Date: 2015-07-17 / License: Apache-2.0