sparkpipe-core (homepage)

Modular, non-linear data pipeline framework for Spark

Enhancing and maintaining productivity on the Spark platform involves implementing scripts in a modular, testable and reusable fashion.
Sparkpipe facilitates expressing and connecting components of Spark jobs in a standard way, so that they might be assembled in series (or even in a more complex dependency graph of operations), reused and shared. Easily connect traditional ETL operations with machine learning and natural language processing, through to output and data visualization.


Tags

  • 1|etl
  • 1|data processing

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages software.uncharted.sparkpipe:sparkpipe-core:0.9.7

sbt

In your sbt build file, add:

libraryDependencies += "software.uncharted.sparkpipe" % "sparkpipe-core" % "0.9.7"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>software.uncharted.sparkpipe</groupId>
    <artifactId>sparkpipe-core</artifactId>
    <version>0.9.7</version>
  </dependency>
</dependencies>

Releases

Version: 0.9.7 ( 2aff5e | zip | jar ) / Date: 2016-02-24 / License: BSD 3-Clause

Version: 0.9.6 ( b330de | zip | jar ) / Date: 2016-02-19 / License: Apache-2.0

Version: 0.9.5 ( 1fb943 | zip | jar ) / Date: 2016-01-24 / License: Apache-2.0

Version: 0.9.4 ( c414d9 | zip | jar ) / Date: 2016-01-11 / License: Apache-2.0

Version: 0.9.3 ( 7008fc | zip ) / Date: 2016-01-08 / License: Apache-2.0