maelstrom (homepage)

Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream processing), scalable (consumes messges at Spark worker nodes), and is extremely reliable.

@jeoffreylim / (0)

Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance
(sub-millisecond stream processing), scalable (consumes messages at Spark worker nodes), and is extremely reliable.
This library has been running stable in production environment and has been proven to be resilient to numerous
production issues.
Thanks to [Adlogica](http://www.adlogica.com/) for sharing to the open source community this project!
## Features
- Simple framework which follows Kafka semantics and best practices.
- High performance, with latencies down to sub-milliseconds.
- Scalable, where message consumption is received in the Spark worker nodes and not on the driver side.
- Throttling by specifying maximum number of messages to process per each "bulk receive"
- Built-in offset management stored in Zookeeper. Numerous Kafka monitoring tools should work out of the box.
- Fault tolerant design, if in case stream processing fails: it would go back to the last processed offsets.
- Resilient to Kafka problems (automatic leader election detection, rebalance, etc)
- Kafka connection resource is pooled and re-used but always get validated if connected to the correct leader broker.


Tags

  • 1|streaming
  • 1|kafka

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.

Releases

No releases yet.