Spark Packages

A community index of third-party packages for Apache Spark.

Showing packages 51 - 100 out of 518

spark-cloudant

Spark SQL IBM Cloudant External Datasource

@cloudant / No release yet / (1)

docker-spark

Docker container for spark standalone cluster.

@epahomov / No release yet / (0)

spark-libFM

An implement of Factorization Machines (LibFM)

@zhengruifeng / No release yet / (0)

MLlib-dropout

Package adding dropout regularization to Apache Spark MLlib project

@rakeshchalasani / No release yet / (1)

spark-infotheoretic-feature-selection

Feature Selection framework based on Information Theory that includes: mRMR, InfoGain, JMI and other commonly used FS filters.

@sramirez / Latest release: 1.4.4 (2017-09-25) / Apache-2.0 / (8)

spark-MDLP-discretization

Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)

@sramirez / Latest release: 1.4.1 (2017-09-25) / Apache-2.0 / (7)

spark-ml-class

Coursera Machine Learning class examples in Spark

@zinniasystems / No release yet / (0)

spark-archetype-scala

Maven archetype used to bootstrap a Spark Scala project

@mbonaci / Latest release: 0.9 (2015-04-24) / MIT / (0)

couchbase-spark-connector

Deprecated, please see couchbase/couchbase-spark-connector

@couchbaselabs / Latest release: 1.0.0 (2015-10-20) / Apache-2.0 / (1)

spark-pmml-exporter-validator

Using JPMML Evaluator to validate the PMML models exported from Spark

@selvinsource / No release yet / (1)

sbt-spark-ec2

SBT plugin for spark-ec2

@pishen / No release yet / (0)

spark-sas7bdat

Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL

@saurfang / Latest release: 3.0.0-s_2.12 (2020-09-13) / Apache-2.0 / (1)

spark-solr

Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.

@LucidWorks / Latest release: 2.0.1 (2016-06-09) / Apache-2.0 / (1)

succinct

Spark and Spark SQL integration for Succinct

@amplab / Latest release: 0.1.8 (2019-07-10) / Apache-2.0 / (1)

sparkit-learn

PySpark + Scikit-learn = Sparkit-learn

@lensacom / No release yet / (2)

RabbitMQ-Receiver

RabbitMQ Spark Streaming receiver

@Stratio / Latest release: 0.4.0 (2016-12-20) / Apache-2.0 / (10)

streaming-matrix-factorization

Streaming Recommendation Engine using matrix factorization with user and product bias

@brkyvz / Latest release: 0.1.0 (2015-05-26) / Apache-2.0 / (2)

zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.

@cloudml / No release yet / (2)

spark-es

ElasticSearch integration for Apache Spark

@SHSE / Latest release: 1.0.7 (2016-02-04) / Apache-2.0 / (1)

mypackage

Test Project

@EronWright / Latest release: 0.0.13 (2015-06-11) / Apache-2.0 / (0)

pyspark-elastic

Pyspark support for Elastic Search

@TargetHolding / Latest release: 0.4.2 (2016-03-22) / Apache-2.0 / (1)

aerosolve

A machine learning package built for humans.

@airbnb / No release yet / (1)

dissolve-struct

Distributed solver library for large-scale structured output prediction

@dalab / No release yet / (0)

spark-streamingsql

Manipulate Apache Spark Streaming by SQL

@Intel-bigdata / No release yet / (1)

mba

Two way association analysis

@mfawadalam / No release yet / (0)

DDF

Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data/Compute Engine

@ddf-project / No release yet / (11)

spark-datetime

A library for exposing dateTime functions from the joda library as SQL functions. With a dsl to build dateTime catalyst expressions.

@SparklineData / Latest release: 0.0.2 (2015-10-29) / Apache-2.0 / (1)

adam

A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.

@bigdatagenomics / No release yet / (1)

spark-deployer

Deploy Spark cluster in an easy way.

@pishen / Latest release: 0.5.1 (2015-06-25) / Apache-2.0 / (0)

sparkboost

A distributed implementation of AdaBoost.MH and MP-Boost using Apache Spark

@tizfa / Latest release: 0.6 (2015-07-01) / Apache-2.0 / (0)

hivemall-spark

A Hivemall wrapper for Spark

@maropu / Latest release: 0.0.6 (2016-04-07) / Apache-2.0 / (0)

elasticsearch-hadoop

Official integration between Apache Spark and Elasticsearch real-time search and analytics

@elastic / Latest release: 5.3.1 (2017-04-21) / Apache-2.0 / (3)

patchwork

Highly Scalable Grid-Density Clustering Algorithm for Spark MLLib

@thomastriplet / No release yet / (0)

spark-lda

Spark package with multiple LDA implementations

@EntilZha / No release yet / (0)

jaws-spark-sql-rest

Restful service for running Spark SQL/Shark queries on top of Spark, with Mesos and Tachyon support.

@Atigeo / No release yet / (0)

spark-job-rest

Restful service that enables support for multiple spark contexts created from the same server.

@Atigeo / No release yet / (0)

sp-demo

WIP Demo Package

@brkyvz / No release yet / (0)

modelmatrix

Alternative to Spark machine learning pipeline feature extractors, focused on building sparse feature vectors.

@collectivemedia / No release yet / (1)

spark-knn-graphs

Spark algorithms for building and processing k-nn graphs

@tdebatty / Latest release: 0.13 (2016-02-17) / MIT / (1)

Spark-SQL-on-HBase

Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces

@Huawei-Spark / Latest release: 1.0.0 (2015-07-17) / Apache-2.0 / (1)

xpatterns-xframe

Simplified tabular data processing library for Spark

@Atigeo / No release yet / (0)

magellan

Geo Spatial Data Analytics on Spark

@harsha2010 / Latest release: 1.0.5-s_2.11 (2017-08-14) / Apache-2.0 / (1)

sparrow

Scala library for converting Spark rows to case classes

@ypg-data / Latest release: 0.2.0-s_2.11 (2016-03-01) / Apache-2.0 / (0)

SparkTwitterAnalysis

An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build(SBT) for building the project.

@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 / (1)

pyspark-notebook

Pyspark Notebook With Docker.

@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 / (1)

spark-json-relay

SparkListener that converts SparkListenerEvents to JSON and forwards them to an external service via RPC.

@hammerlab / Latest release: 2.0.1 (2015-10-12) / Apache-2.0 / (0)

spark-streaming-gnip

An Apache Spark utility for pulling Tweets from Gnip's PowerTrack in realtime

@knoldus / No release yet / (1)

spark-on-hbase

Generic solution for scanning, joining and mutating HBase tables to and from the Spark RDDs.

@michal-harish / No release yet / (0)

spark-salesforce

Spark Salesforce Wave Connector

@springml / Latest release: 1.2.0 (2018-04-25) / Apache-2.0 / (2)

spark-centrality

Library for computing centrality for graph nodes

@webgeist / Latest release: 0.11 (2015-08-09) / LGPL-3.0 / (3)