Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
@qubole / (0)
Streaminglens is a profiling tool for Spark Structured Streaming Applications running in micro-batch mode. Streaminglens analyzes the execution run of last micro-batch every five minutes to give an overall idea of the health of the streaming pipeline. During this analysis, Streaminglens calculates the critical time to complete a micro-batch. Critical Time is the minimum time a Spark job would take to complete if it is run with infinite executors. Streaminglens also takes expected micro-batch SLA as input and it expects every micro-batch to complete before the specified SLA. Based on the comparison of critical time, actual batch running time and expected microbatch SLA, streaminglens decides the state of the streaming pipeline as Optimum, Underprovisioned, Overprovisioned or Unhealthy and gives appropiate recommendations to tune the spark cluster.
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages com.qubole:spark-streaminglens_2.11:0.5.3
In your sbt build file, add:
libraryDependencies += "com.qubole" % "spark-streaminglens_2.11" % "0.5.3"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>com.qubole</groupId> <artifactId>spark-streaminglens_2.11</artifactId> <version>0.5.3</version> </dependency> </dependencies>