spark-indexedrdd (homepage)
An efficient updatable key-value store for Apache Spark
@amplab / (1)
IndexedRDD extends `RDD[(K, V)]` by enforcing key uniqueness and pre-indexing the entries for efficient joins and point lookups, updates, and deletions. It is implemented by (1) hash-partitioning the entries by key, (2) maintaining a radix tree (PART) index within each partition, and (3) using purely functional (immutable and efficiently updatable) data structures to enable efficient modifications and deletions.
Tags
How to
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages amplab:spark-indexedrdd:0.4.0
sbt
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "amplab/spark-indexedrdd:0.4.0"
Otherwise,
resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/" libraryDependencies += "amplab" % "spark-indexedrdd" % "0.4.0"
Maven
In your pom.xml, add:<dependencies> <!-- list of dependencies --> <dependency> <groupId>amplab</groupId> <artifactId>spark-indexedrdd</artifactId> <version>0.4.0</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>https://repos.spark-packages.org/</url> </repository> </repositories>
Releases
Version: 0.4.0 ( 03f417 | zip | jar ) / Date: 2017-01-11 / License: Apache-2.0 / Scala version: 2.11
Version: 0.3 ( 274078 | zip | jar ) / Date: 2015-09-10 / License: Apache-2.0 / Scala version: 2.10
Version: 0.2 ( 404091 | zip | jar ) / Date: 2015-08-19 / License: Apache-2.0 / Scala version: 2.10
Version: 0.1 ( c79a14 | zip | jar ) / Date: 2015-03-31 / License: Apache-2.0 / Scala version: 2.10