sparkpipe-core

sparkpipe-core (homepage)

Modular, non-linear data pipeline framework for Spark

Enhancing and maintaining productivity on the Spark platform involves implementing scripts in a modular, testable and reusable fashion.
Sparkpipe facilitates expressing and connecting components of Spark jobs in a standard way, so that they might be assembled in series (or even in a more complex dependency graph of operations), reused and shared. Easily connect traditional ETL operations with machine learning and natural language processing, through to output and data visualization.

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages software.uncharted.sparkpipe:sparkpipe-core:0.9.7

sbt

In your sbt build file, add:

libraryDependencies += "software.uncharted.sparkpipe" % "sparkpipe-core" % "0.9.7"

Maven

In your pom.xml, add:

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>software.uncharted.sparkpipe</groupId>
    <artifactId>sparkpipe-core</artifactId>
    <version>0.9.7</version>
  </dependency>
</dependencies>

Releases

Version: 0.9.7 ( 2aff5e | zip | jar ) / Date: 2016-02-24 / License: BSD 3-Clause

Version: 0.9.6 ( b330de | zip | jar ) / Date: 2016-02-19 / License: Apache-2.0

Version: 0.9.5 ( 1fb943 | zip | jar ) / Date: 2016-01-24 / License: Apache-2.0

Version: 0.9.4 ( c414d9 | zip | jar ) / Date: 2016-01-11 / License: Apache-2.0

Version: 0.9.3 ( 7008fc | zip ) / Date: 2016-01-08 / License: Apache-2.0