Qubole Sparklens tool for performance tuning Apache Spark
@qubole / (2)
Sparklens is a profiling tool for Spark with built-in Spark Scheduler simulator. Its primary goal is to make it easy to understand the scalability limits of spark applications. It helps in understanding how efficiently is a given spark application using the compute resources provided to it. May be your application will run faster with more executors and may be it wont. Sparklens can answer this question by looking at a single run of your application.
It helps you narrow down to few stages (or driver, or skew or lack of tasks) which are limiting your application from scaling out and provides contextual information about what could be going wrong with these stages. Primarily it helps you approach spark application tuning as a well defined method/process instead of something you learn by trial and error, saving both developer and compute time.
Tags (No tags yet, login to add one. )
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages qubole:sparklens:0.2.1-s_2.10
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "qubole/sparklens:0.2.1-s_2.10"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "qubole" % "sparklens" % "0.2.1-s_2.10"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>qubole</groupId> <artifactId>sparklens</artifactId> <version>0.2.1-s_2.10</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>