A community index of third-party packages for Apache Spark.

Showing packages 51 - 100 out of 512

Spark SQL IBM Cloudant External Datasource

@cloudant / No release yet / (1)

  • 1|data source
  • 1|sql


Docker container for spark standalone cluster.

@epahomov / No release yet / (0)

  • 1|tools
  • 1|deployment


An implement of Factorization Machines (LibFM)

@zhengruifeng / No release yet / (0)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Package adding dropout regularization to Apache Spark MLlib project

@rakeshchalasani / No release yet / (1)

  • 1|machine learning
  • 1|mllib
  • 1|scala


Feature Selection framework based on Information Theory that includes: mRMR, InfoGain, JMI and other commonly used FS filters.

@sramirez / Latest release: 1.4.4 (2017-09-25) / Apache-2.0 / (8)

  • 3|feature-selection
  • 3|mllib
  • 3|machine learning


Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)

@sramirez / Latest release: 1.4.1 (2017-09-25) / Apache-2.0 / (7)

  • 2|discretization
  • 2|mllib
  • 1|machine learning


Coursera Machine Learning class examples in Spark

@zinniasystems / No release yet / (0)

  • 2|ml
  • 2|machine learning
  • 1|example


Maven archetype used to bootstrap a Spark Scala project

@mbonaci / Latest release: 0.9 (2015-04-24) / MIT / (0)

  • 1|Maven
  • 1|tools
  • 1|scala


Deprecated, please see couchbase/couchbase-spark-connector

@couchbaselabs / Latest release: 1.0.0 (2015-10-20) / Apache-2.0 / (1)

  • 1|streaming
  • 1|library
  • 1|sql


Using JPMML Evaluator to validate the PMML models exported from Spark

@selvinsource / No release yet / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


SBT plugin for spark-ec2

@pishen / No release yet / (0)

  • 1|tools
  • 1|sbt
  • 1|deployment


Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL

@saurfang / Latest release: 3.0.0-s_2.12 (2020-09-13) / Apache-2.0 / (1)

  • 1|sas
  • 1|tools
  • 1|sql


Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.

@LucidWorks / Latest release: 2.0.1 (2016-06-09) / Apache-2.0 / (1)

  • 1|ml
  • 1|data sources
  • 1|solr


Spark and Spark SQL integration for Succinct

@amplab / Latest release: 0.1.8 (2019-07-10) / Apache-2.0 / (1)

  • 1|application
  • 1|data source


PySpark + Scikit-learn = Sparkit-learn

@lensacom / No release yet / (2)

  • 1|python
  • 1|scikit-learn
  • 1|machine learning


RabbitMQ Spark Streaming receiver

@Stratio / Latest release: 0.4.0 (2016-12-20) / Apache-2.0 / (10)

  • 4|streaming


Streaming Recommendation Engine using matrix factorization with user and product bias

@brkyvz / Latest release: 0.1.0 (2015-05-26) / Apache-2.0 / (2)

  • 1|streaming
  • 1|ml
  • 1|machine learning


Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.

@cloudml / No release yet / (2)

  • 1|ml
  • 1|mllib
  • 1|machine learning


ElasticSearch integration for Apache Spark

@SHSE / Latest release: 1.0.7 (2016-02-04) / Apache-2.0 / (1)

  • 1|analytics
  • 1|search
  • 1|elasticsearch


Test Project

@EronWright / Latest release: 0.0.13 (2015-06-11) / Apache-2.0 / (0)


Pyspark support for Elastic Search

@TargetHolding / Latest release: 0.4.2 (2016-03-22) / Apache-2.0 / (1)

  • 1|python
  • 1|spark
  • 1|database


A machine learning package built for humans.

@airbnb / No release yet / (1)

  • 1|machine learning


Distributed solver library for large-scale structured output prediction

@dalab / No release yet / (0)

  • 1|Support Vector Machine
  • 1|Structured Prediction
  • 1|machine learning


Manipulate Apache Spark Streaming by SQL

@Intel-bigdata / No release yet / (1)

  • 1|streaming
  • 1|sql


Two way association analysis

@mfawadalam / No release yet / (0)


Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data/Compute Engine

@ddf-project / No release yet / (11)

  • 3|API
  • 2|tools
  • 2|machine learning


A library for exposing dateTime functions from the joda library as SQL functions. With a dsl to build dateTime catalyst expressions.

@SparklineData / Latest release: 0.0.2 (2015-10-29) / Apache-2.0 / (1)

  • 1|spark
  • 1|sql
  • 1|dateTime


A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.

@bigdatagenomics / No release yet / (1)


Deploy Spark cluster in an easy way.

@pishen / Latest release: 0.5.1 (2015-06-25) / Apache-2.0 / (0)

  • 1|tools
  • 1|sbt
  • 1|deployment


A distributed implementation of AdaBoost.MH and MP-Boost using Apache Spark

@tizfa / Latest release: 0.6 (2015-07-01) / Apache-2.0 / (0)

  • 1|adaboost
  • 1|classification
  • 1|machine learning


A Hivemall wrapper for Spark

@maropu / Latest release: 0.0.6 (2016-04-07) / Apache-2.0 / (0)

  • 1|sql
  • 1|hive
  • 1|machine learning


Official integration between Apache Spark and Elasticsearch real-time search and analytics

@elastic / Latest release: 5.3.1 (2017-04-21) / Apache-2.0 / (3)

  • 1|search
  • 1|elasticsearch
  • 1|sql


Highly Scalable Grid-Density Clustering Algorithm for Spark MLLib

@thomastriplet / No release yet / (0)

  • 1|clustering
  • 1|spark
  • 1|machine learning


Spark package with multiple LDA implementations

@EntilZha / No release yet / (0)


Restful service for running Spark SQL/Shark queries on top of Spark, with Mesos and Tachyon support.

@Atigeo / No release yet / (0)


Restful service that enables support for multiple spark contexts created from the same server.

@Atigeo / No release yet / (0)


WIP Demo Package

@brkyvz / No release yet / (0)


Alternative to Spark machine learning pipeline feature extractors, focused on building sparse feature vectors.

@collectivemedia / No release yet / (1)

  • 2|feature extraction
  • 2|machine learning


Spark algorithms for building and processing k-nn graphs

@tdebatty / Latest release: 0.13 (2016-02-17) / MIT / (1)

  • 1|graph
  • 1|machine learning


Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces

@Huawei-Spark / Latest release: 1.0.0 (2015-07-17) / Apache-2.0 / (1)

  • 1|hbase


Simplified tabular data processing library for Spark

@Atigeo / No release yet / (0)


Geo Spatial Data Analytics on Spark

@harsha2010 / Latest release: 1.0.5-s_2.11 (2017-08-14) / Apache-2.0 / (1)

  • 2|geospatial
  • 2|data source
  • 2|sql


Scala library for converting Spark rows to case classes

@ypg-data / Latest release: 0.2.0-s_2.11 (2016-03-01) / Apache-2.0 / (0)

  • 1|sql
  • 1|library
  • 1|scala


An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build(SBT) for building the project.

@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 / (1)

  • 1|streaming
  • 1|sbt
  • 1|scala


Pyspark Notebook With Docker.

@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 / (1)

  • 2|python
  • 1|docker
  • 1|pyspark


SparkListener that converts SparkListenerEvents to JSON and forwards them to an external service via RPC.

@hammerlab / Latest release: 2.0.1 (2015-10-12) / Apache-2.0 / (0)


An Apache Spark utility for pulling Tweets from Gnip's PowerTrack in realtime

@knoldus / No release yet / (1)

  • 1|streaming
  • 1|data source
  • 1|scala


Generic solution for scanning, joining and mutating HBase tables to and from the Spark RDDs.

@michal-harish / No release yet / (0)


Spark Salesforce Wave Connector

@springml / Latest release: 1.2.0 (2018-04-25) / Apache-2.0 / (2)

  • 1|salesforce
  • 1|data source


Library for computing centrality for graph nodes

@webgeist / Latest release: 0.11 (2015-08-09) / LGPL-3.0 / (3)

  • 2|graph