Basic framework utilities to quickly start writing production ready Apache Spark applications
@tupol / (0)
This project contains some basic utilities that can help setting up a Spark application project.
The SparkRunnable and SparkApp together with the configuration framework provide for easy Spark application creation with configuration that can be managed through configuration files or application parameters.
The IO frameworks for reading and writing data frames add extra convenience for setting up batch jobs that transform various types of files.
Last but not least, there are many utility functions that provide convenience for loading resources, dealing with schemas and so on.
Most of the common features are also implemented as decorators to main Spark classes, like SparkContext, DataFrame and StructType and they are conveniently available by importing the org.tupol.spark.implicits._ package.
The main utilities and frameworks available:
SparkApp & SparkRunnable
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages org.tupol:spark-utils_2.11:0.4.0
In your sbt build file, add:
libraryDependencies += "org.tupol" % "spark-utils_2.11" % "0.4.0"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>org.tupol</groupId> <artifactId>spark-utils_2.11</artifactId> <version>0.4.0</version> </dependency> </dependencies>