spark-skewjoin (homepage)
Joins for skewed datasets in Spark
@tresata / (0)
This library adds the skewJoin operation to RDD[(K, V)] where possible (certain implicit typeclasses are required for K and V). A skew join is just like a normal join except that keys with large amounts of values are not processed by a single task but instead spread out across many tasks.
Tags
How to
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages com.tresata:spark-skewjoin_2.10:0.2.0
sbt
In your sbt build file, add:
libraryDependencies += "com.tresata" % "spark-skewjoin_2.10" % "0.2.0"
Maven
In your pom.xml, add:<dependencies> <!-- list of dependencies --> <dependency> <groupId>com.tresata</groupId> <artifactId>spark-skewjoin_2.10</artifactId> <version>0.2.0</version> </dependency> </dependencies>
Releases
Version: 0.2.0-s_2.10 ( e37803 | zip | jar ) / Date: 2015-11-13 / License: Apache-2.0 / Scala version: 2.10
Version: 0.2.0-s_2.11 ( e37803 | zip | jar ) / Date: 2015-11-13 / License: Apache-2.0 / Scala version: 2.11
Version: 0.1.0 ( b6d9a9 | zip | jar ) / Date: 2015-08-29 / License: Apache-2.0 / Scala version: 2.10