Receiver Based Reliable Low Level Kafka-Spark Consumer for Spark Streaming . Built-in Back-Pressure Controller . ZK Based offset Management . WAL Less Recovery . Custom Message Interceptor
@dibbhatt / (7)
This utility will help to pull messages from Kafka using Spark Streaming and have better handling of the Kafka Offsets and failures. This Consumer have implemented a Custom Reliable Receiver to fetch messages from Kafka and store every received block in Spark BlockManager. This consumer can commit the offsets of processed batch after each Spark Streaming batch is completed.
Low Level Consumer API is compatible across Kafka versions 0.8, 0.9, 0.10, 0.11 and All Spark Versions including the latest Spark 2.2.0
Salient Features of Kafka Spark Consumer
-- This Consumer uses Zookeeper for storing processed offset for each Kafka partition, which will help to recover in case of failure.
--Support for Kafka Security
-- This Consumer does not require WAL for recovery from Driver or Executor failures.
-- This Consumer has inbuilt PID (Proportional , Integral , Derivative ) based Rate Controller for controlling Back-Pressure.
-- This Consumer have capability to use Message Interceptor which can be used to pre-process Kafka messages before writing to Spark Block Manager.
-- This Consumer has now support Kafka Consumer Lag checker tools like ConsumerOffsetChecker
For more details about how to use this consumer , please refer to https://github.com/dibbhatt/kafka-spark-consumer/blob/master/README.md
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages dibbhatt:kafka-spark-consumer:1.0.14
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "dibbhatt/kafka-spark-consumer:1.0.14"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "dibbhatt" % "kafka-spark-consumer" % "1.0.14"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>dibbhatt</groupId> <artifactId>kafka-spark-consumer</artifactId> <version>1.0.14</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>