Receiver Based Reliable Low Level Kafka-Spark Consumer for Spark Streaming . Built-in Back-Pressure Controller . ZK Based offset Management . WAL Less Recovery . Custom Message Interceptor
@dibbhatt / (7)
This utility will help to pull messages from Kafka using Spark Streaming and have better handling of the Kafka Offsets and failures. This Consumer have implemented a Custom Reliable Receiver which uses low level Kafka Consumer API to fetch messages from Kafka and store every received block in Spark BlockManager. Receiver commit the Kafka offsets of Received blocks once store to Spark BlockManager is successful.
Low Level Consumer API is compatible across Kafka versions 0.8, 0.9, 0.10, 0.11 and All Spark Versions including the latest Spark 2.2.0
Salient Features of Kafka Spark Consumer
-- This Consumer uses Zookeeper for storing the consumed and processed offset for each Kafka partition, which will help to recover in case of failure.
-- This Consumer does not require WAL for recovery from Driver or Executor failures. It has capability to store the processed offset after every Batch interval, in case of any failure, Consumer can start from last Processed offset.
-- This Consumer has inbuilt PID (Proportional , Integral , Derivative ) based Rate Controller for controlling Back-Pressure.
-- This Consumer have capability to use Message Interceptor which can be used to pre-process Kafka messages before writing to Spark Block Manager.
-- This Consumer has now support Kafka Consumer Lag checker tools like ConsumerOffsetChecker
For more details about how to use this consumer , please refer to https://github.com/dibbhatt/kafka-spark-consumer/blob/master/README.md
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages dibbhatt:kafka-spark-consumer:1.0.12
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "dibbhatt/kafka-spark-consumer:1.0.12"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "dibbhatt" % "kafka-spark-consumer" % "1.0.12"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>dibbhatt</groupId> <artifactId>kafka-spark-consumer</artifactId> <version>1.0.12</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>