Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
@Huawei-Spark / (1)
This technology provides with scalable and reliable Spark SQL/DataFrame access
to NOSQL data in HBase, through HBase's "native" data access APIs. HBase
pushdown capabilities, in forms of projection pruning, coprocessor and custom
filtering, are optimally utilized to support ultra low latency processing. A
novel technique based upon partial evaluation is introduced to process
virtually arbitrarily complex logic predicates to precisely prune partitions,
generate partition-specific predicates, and enable intelligent jumps in scans
over multidimensional data sets. Overall the system is designed for ad hoc, interactive queries.
The current version of 0.1.0 runs on Apache Spark 1.4.0 release.
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages Huawei-Spark:Spark-SQL-on-HBase:1.0.0
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "Huawei-Spark/Spark-SQL-on-HBase:1.0.0"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "Huawei-Spark" % "Spark-SQL-on-HBase" % "1.0.0"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>Huawei-Spark</groupId> <artifactId>Spark-SQL-on-HBase</artifactId> <version>1.0.0</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>