rich-spark (homepage)

Rich Spark adds more to Apache Spark

@mashin-io / (0)

This package adds more to Apache Spark. Currently, there are two sub packages, main and streaming.
The streaming package is an extension of the Spark Streaming API to allow for built-in scheduling of Spark jobs (both batch and streaming jobs). Instead of deploying and configuring a scheduling service (e.g. Apache Oozie, Mesos Chronos, Linux Crons ...), this extension allows scheduling Spark jobs from within the job code making the scheduling semantics part of the job semantics.
It's not only possible to schedule jobs on timely basis but also based on events from a various set of event sources like filesystem events and REST API calls from a web admin console. Moreover, this extension integrates with ReactiveX enabling scheduling on complex events.
For more information, please read the docs https://github.com/mashin-io/rich-spark/blob/master/docs/reactive-spark-doc.md.
The main sub package provides minor API extensions like: rdd.scanLeft, rdd.scanRight, sc.httpRDD for creating RDDs from REST API calls, and ParallelSGD which is a parallelized version of mini-batch stochastic gradient descent (see SPARK-14880).


Tags

  • 1|ml
  • 1|library
  • 1|streaming
  • 1|machine learning
  • 1|scala
  • 1|core
  • 1|java

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.

Releases

No releases yet.