Create composable data processing pipelines in Spark, and execute them on a cluster using simple Scala code
@springnz / (0)
This project aims to make it easier to use Spark successfully in production environments.
In particular, it addresses two specific requirements.
1. Creating composable data processing pipelines that are easy to reuse and test in isolation.
Composable data pipelines are built using a simple monadic abstraction called a SparkOperation. These SparkOperations can be chained together easily, but can also be tested in isolation. In addition, wrappers for some commonly used data sources and frameworks are provided.
2. Providing a lightweight mechanism for launching and executing Spark processes on a cluster.
Execution on a cluster takes place by extending the spark-submit mechanism provided by Spark. spark-submit is used to start a remote akka actor system that the client application can use to run spark job requests through a simple Scala futures based interface. It also allows for long lived spark contexts and low-latency job execution.
See the github homepage for detailed information.
This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.
No releases yet.