Spark-SQL-on-HBase (homepage)

Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces

This technology provides with scalable and reliable Spark SQL/DataFrame access
to NOSQL data in HBase, through HBase's "native" data access APIs. HBase
pushdown capabilities, in forms of projection pruning, coprocessor and custom
filtering, are optimally utilized to support ultra low latency processing. A
novel technique based upon partial evaluation is introduced to process
virtually arbitrarily complex logic predicates to precisely prune partitions,
generate partition-specific predicates, and enable intelligent jumps in scans
over multidimensional data sets. Overall the system is designed for ad hoc, interactive queries.
The current version of 0.1.0 runs on Apache Spark 1.4.0 release.


Tags

  • 1|hbase

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages Huawei-Spark:Spark-SQL-on-HBase:1.0.0

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "Huawei-Spark/Spark-SQL-on-HBase:1.0.0"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "Huawei-Spark" % "Spark-SQL-on-HBase" % "1.0.0"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>Huawei-Spark</groupId>
    <artifactId>Spark-SQL-on-HBase</artifactId>
    <version>1.0.0</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Releases

Version: 1.0.0 ( 380fdf | zip | jar ) / Date: 2015-07-17 / License: Apache-2.0

Spark Scala/Java API compatibility: 1.0.0 - 17% , 1.1.0 - 25% , 1.2.0 - 32% , 1.3.0 - 59% , 1.4.0 - 100%