A community index of third-party packages for Apache Spark.
Showing packages 1 - 50 out of 515
mllib-grid-search
An example project for doing grid search in MLlib
@spark-ml / Latest release: 0.0.1 (2014-11-27) / BSD 3-Clause / (2)
spark-avro
Integration utilities for using Spark with Apache Avro data
@databricks / Latest release: 4.0.0-s_2.11 (2017-10-30) / Apache-2.0 / (13)
spark-redshift
Redshift Data Source for Apache Spark
@databricks / Latest release: 3.0.0-preview1 (2016-11-01) / Apache-2.0 / (3)
kafka-spark-consumer
High Performance Kafka Consumer for Spark Streaming.Supports Multi Topic Fetch, Kafka Security. Reliable offset management in Zookeeper. No Data-loss. No dependency on HDFS and WAL. In-built PID rate controller. Support Message Handler . Offset Lag checker
@dibbhatt / Latest release: 2.1.0 (2019-08-28) / Apache-2.0 / (7)
thunder
Large-scale neural data analysis with Spark
@freeman-lab / Latest release: 0.4.1 (2014-11-27) / BSD 3-Clause / (6)
GMM
Gaussian Mixture Model Implementation in Pyspark
@FlytxtRnD / Latest release: 0.1 (2015-04-07) / EPL-1.0 / (5)
spark-csv
Spark SQL CSV data source
@databricks / Latest release: 1.5.0-s_2.11 (2016-09-07) / Apache-2.0 / (10)
spark-indexedrdd
An efficient updatable key-value store for Apache Spark
@amplab / Latest release: 0.4.0 (2017-01-11) / Apache-2.0 / (1)
killrweather
KillrWeather is a reference application (in progress) showing how to easily leverage and integrate Apache Spark, Apache Cassandra, and Apache Kafka for fast, streaming computations on time series data in asynchronous Akka event-driven environments.
@killrweather / No release yet / (1)
spark-hbase
Integration utilities for using Spark with Apache HBase data
@haosdent / No release yet / (1)
spark_hbase
The example in Scala of reading data saved in hbase by Spark and the example of converter for python
@GenTang / No release yet / (3)
sparkling
A Clojure library for Apache Spark: fast, fully-features, and developer friendly
@gorillalabs / Latest release: 1.0.0 (2014-12-31) / EPL-1.0 / (3)
spark-kernel
A kernel that enables applications to interact with Apache Spark.
@ibm-et / No release yet / (0)
pyspark-pictures
Learn the pySpark API through pictures and simple examples
@jkthompson / No release yet / (0)
deep-spark
Connecting Apache Spark with different data stores
@Stratio / Latest release: 0.7.0-RC1 (2015-01-14) / Apache-2.0 / (20)
streaming-cep-engine
Streaming CEP Engine Powered by Spark Streaming & Siddhi
@Stratio / Latest release: 0.6.2 (2015-01-14) / Apache-2.0 / (19)
generalized-kmeans-clustering
This project generalizes the Spark MLLIB K-Means clusterer to support arbitrary distance functions
@derrickburns / No release yet / (3)
sparkling-water
Sparkling Water provides H2O algorithms inside Spark cluster
@h2oai / Latest release: 1.4.3 (2015-07-06) / Apache-2.0 / (2)
spark-ml-streaming
Visualize streaming machine learning in Spark
@freeman-lab / No release yet / (1)
spark-hbase-connector
Connect Spark to HBase for reading and writing data with ease
@nerdammer / Latest release: 1.0.3 (2016-04-20) / Apache-2.0 / (3)
spark-testing-base
Base classes to use when writing tests with Spark
@holdenk / Latest release: 2.2.2_0.11.0 (2018-12-23) / Apache-2.0 / (10)
pyspark-csv
An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parses csv data into SchemaRDD. No installation required, simply include pyspark_csv.py via SparkContext.
@seahboonsiew / No release yet / (1)
spark-mongodb
MongoDB data source for Spark SQL
@Stratio / Latest release: 0.12.0 (2016-08-31) / Apache-2.0 / (14)
pyspark-cassandra
PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.
@TargetHolding / Latest release: 0.3.5 (2016-03-30) / Apache-2.0 / (1)
demo-scala-python
A Spark Package Template
@brkyvz / Latest release: 1.2-s_2.10 (2016-05-25) / Apache-2.0 / (1)
sbt-spark-package
Sbt plugin for Spark packages
@databricks / Latest release: 0.2.4 (2016-07-15) / Apache-2.0 / (3)
spark-scalding
Use Cascading Taps and Scalding DSL with Spark — Edit
@tresata / Latest release: 0.5.0-s_2.10 (2015-11-13) / Apache-2.0 / (0)
spark-sorted
Secondary sort and streaming reduce for Spark
@tresata / Latest release: 0.4.0-s_2.11 (2015-11-03) / Apache-2.0 / (0)
spark-kafka
Low level integration of Spark and Kafka
@tresata / Latest release: 0.6.0-s_2.10 (2015-11-13) / Apache-2.0 / (0)
spark-cassandra-connector
Connects Spark to Cassandra
@datastax / Latest release: 2.4.0-s_2.11 (2018-11-29) / Apache-2.0 / (14)
spark-package-cmd-tool
A command line tool for Spark packages
@databricks / Latest release: 0.3.0 (2015-03-17) / Apache-2.0 / (1)
spawncamping-dds
Data-Driven Spark allows quick data exploration based on Apache Spark
@FRosner / No release yet / (0)
spark-power-bi
Power BI API adapter for Apache Spark
@granturing / Latest release: 1.5.0_0.0.7 (2015-09-13) / Apache-2.0 / (0)
meetup-stream
Spark Streaming, Machine Learning and meetup.com streaming API.
@actions / No release yet / (1)
demo-python-sp
Pure python package used for testing Spark Packages
@brkyvz / Latest release: 0.4.2 (2016-02-14) / Apache-2.0 / (0)
spark-mrmr-feature-selection
Feature selection based on information gain: maximum relevancy minimum redundancy
@wxhC3SC6OPm8M1HXboMy / No release yet / (0)
spark-csv2sql
Hand routine to import csv files as tables in spark sql
@wxhC3SC6OPm8M1HXboMy / No release yet / (0)
spark-sequoiadb
Spark connector for SequoiaDB
@SequoiaDB / Latest release: 1.12-s_2.11 (2015-03-30) / Apache-2.0 / (2)
spark-notebook
Use Apache Spark straight from the Browser
@andypetrella / Latest release: v0.4.0 (2015-03-29) / Apache-2.0 / (2)