spark-indexedrdd (homepage)

An efficient updatable key-value store for Apache Spark

@amplab / (1)

IndexedRDD extends `RDD[(K, V)]` by enforcing key uniqueness and pre-indexing the entries for efficient joins and point lookups, updates, and deletions. It is implemented by (1) hash-partitioning the entries by key, (2) maintaining a radix tree (PART) index within each partition, and (3) using purely functional (immutable and efficiently updatable) data structures to enable efficient modifications and deletions.


Tags

  • 2|kv
  • 2|core
  • 1|anothertag

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages amplab:spark-indexedrdd:0.4.0

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "amplab/spark-indexedrdd:0.4.0"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "amplab" % "spark-indexedrdd" % "0.4.0"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>amplab</groupId>
    <artifactId>spark-indexedrdd</artifactId>
    <version>0.4.0</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Releases

Version: 0.4.0 ( 03f417 | zip | jar ) / Date: 2017-01-11 / License: Apache-2.0 / Scala version: 2.11

Version: 0.3 ( 274078 | zip | jar ) / Date: 2015-09-10 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: 1.0.0 - 50% , 1.1.0 - 68% , 1.2.0 - 74% , 1.3.0 - 75% , 1.4.0 - 75% , 1.5.0 - 100%

Version: 0.2 ( 404091 | zip | jar ) / Date: 2015-08-19 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: 1.0.0 - 53% , 1.1.0 - 72% , 1.2.0 - 79% , 1.3.0 - 82% , 1.4.0 - 100%

Version: 0.1 ( c79a14 | zip | jar ) / Date: 2015-03-31 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: 1.0.0 - 87% , 1.1.0 - 100% , 1.2.0 - 81% , 1.3.0 - 81%