spark-sorted (homepage)
Secondary sort and streaming reduce for Spark
@tresata / (0)
Spark-sorted is a library that aims to make non-reduce type operations on very large groups in spark possible, including support for processing ordered values. To do so it relies on Spark's new sort-based shuffle and on never materializing the group for a given key but instead representing it by consecutive rows within a partition that get processed with a map-like (iterator based streaming) operation.
Tags
How to
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages com.tresata:spark-sorted_2.11:0.4.0
sbt
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "tresata/spark-sorted:0.4.0"
Otherwise,
libraryDependencies += "com.tresata" % "spark-sorted_2.11" % "0.4.0"
Maven
In your pom.xml, add:<dependencies> <!-- list of dependencies --> <dependency> <groupId>com.tresata</groupId> <artifactId>spark-sorted_2.11</artifactId> <version>0.4.0</version> </dependency> </dependencies>
Releases
Version: 0.4.0-s_2.11 ( 72fd27 | zip | jar ) / Date: 2015-11-03 / License: Apache-2.0 / Scala version: 2.11
Version: 0.4.0-s_2.10 ( 72fd27 | zip | jar ) / Date: 2015-11-03 / License: Apache-2.0 / Scala version: 2.10
Version: 0.3.1-s_2.10 ( c721f7 | zip | jar ) / Date: 2015-08-04 / License: Apache-2.0 / Scala version: 2.10
Version: 0.3.1-s_2.11 ( c721f7 | zip | jar ) / Date: 2015-08-04 / License: Apache-2.0 / Scala version: 2.11
Version: 0.3.0-s_2.11 ( 8f86ed | zip | jar ) / Date: 2015-07-28 / License: Apache-2.0 / Scala version: 2.11
Version: 0.3.0-s_2.10 ( 8f86ed | zip | jar ) / Date: 2015-07-28 / License: Apache-2.0 / Scala version: 2.10
Version: 0.2.0-s_2.11 ( 3a1c2f | zip | jar ) / Date: 2015-05-27 / License: Apache-2.0 / Scala version: 2.11
Version: 0.2.0-s_2.10 ( 3a1c2f | zip | jar ) / Date: 2015-05-27 / License: Apache-2.0
Version: 0.1.0 ( 8037da | zip | jar ) / Date: 2015-03-30 / License: Apache-2.0 / Scala version: 2.10