mezzanine

mezzanine (homepage)

Mezzanine is a library built on Spark Streaming used to consume data from Kafka and store it into Hadoop.

Mezzanine is a library built on Spark Streaming used to consume data from Kafka and store it into Hadoop.

This library was built to replace the batch-based model of Kafka consumption, where jobs would be launched periodically to consume and persist large amounts of data at a time. Mezzanine contains logic for transforming, partitioning, and compacting the consumed Kafka data to persist them in HDFS. It was built with Baryon to handle the Kafka consumption, but Mezzanine can still be used as library with other methods for consuming from Kafka.

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages com.groupon.dse:mezzanine:1.0

sbt

In your sbt build file, add:

libraryDependencies += "com.groupon.dse" % "mezzanine" % "1.0"

Maven

In your pom.xml, add:

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>mezzanine</artifactId>
    <version>1.0</version>
  </dependency>
</dependencies>

Releases

Version: 1.0 ( 3b989b | zip | jar ) / Date: 2016-07-29 / License: BSD 3-Clause / Scala version: 2.10