spark-sorted (homepage)

Secondary sort and streaming reduce for Spark

@tresata / (0)

Spark-sorted is a library that aims to make non-reduce type operations on very large groups in spark possible, including support for processing ordered values. To do so it relies on Spark's new sort-based shuffle and on never materializing the group for a given key but instead representing it by consecutive rows within a partition that get processed with a map-like (iterator based streaming) operation.


Tags

  • 1|core

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages com.tresata:spark-sorted_2.11:0.4.0

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "tresata/spark-sorted:0.4.0"

Otherwise,

libraryDependencies += "com.tresata" % "spark-sorted_2.11" % "0.4.0"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>com.tresata</groupId>
    <artifactId>spark-sorted_2.11</artifactId>
    <version>0.4.0</version>
  </dependency>
</dependencies>

Releases

Version: 0.4.0-s_2.11 ( 72fd27 | zip | jar ) / Date: 2015-11-03 / License: Apache-2.0 / Scala version: 2.11

Spark Scala/Java API compatibility: 1.2.0 - 100% , 1.3.0 - 100% , 1.4.0 - 100% , 1.5.0 - 100%

Version: 0.4.0-s_2.10 ( 72fd27 | zip | jar ) / Date: 2015-11-03 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: 1.0.0 - 21% , 1.1.0 - 100% , 1.2.0 - 100% , 1.3.0 - 100% , 1.4.0 - 100% , 1.5.0 - 100%

Version: 0.3.1-s_2.10 ( c721f7 | zip | jar ) / Date: 2015-08-04 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: 1.0.0 - 21% , 1.1.0 - 100% , 1.2.0 - 100% , 1.3.0 - 100% , 1.4.0 - 100%

Version: 0.3.1-s_2.11 ( c721f7 | zip | jar ) / Date: 2015-08-04 / License: Apache-2.0 / Scala version: 2.11

Spark Scala/Java API compatibility: 1.2.0 - 100% , 1.3.0 - 100% , 1.4.0 - 100%

Version: 0.3.0-s_2.11 ( 8f86ed | zip | jar ) / Date: 2015-07-28 / License: Apache-2.0 / Scala version: 2.11

Spark Scala/Java API compatibility: 1.2.0 - 100% , 1.3.0 - 100% , 1.4.0 - 100%

Version: 0.3.0-s_2.10 ( 8f86ed | zip | jar ) / Date: 2015-07-28 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: 1.0.0 - 21% , 1.1.0 - 100% , 1.2.0 - 100% , 1.3.0 - 100% , 1.4.0 - 100%

Version: 0.2.0-s_2.11 ( 3a1c2f | zip | jar ) / Date: 2015-05-27 / License: Apache-2.0 / Scala version: 2.11

Spark Scala/Java API compatibility: 1.2.0 - 100% , 1.3.0 - 100%

Version: 0.2.0-s_2.10 ( 3a1c2f | zip | jar ) / Date: 2015-05-27 / License: Apache-2.0

Spark Scala/Java API compatibility: 1.0.0 - 25% , 1.1.0 - 100% , 1.2.0 - 100% , 1.3.0 - 100%

Version: 0.1.0 ( 8037da | zip | jar ) / Date: 2015-03-30 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: 1.0.0 - 25% , 1.1.0 - 100% , 1.2.0 - 100% , 1.3.0 - 100%