A community index of third-party packages for Apache Spark.

Showing packages 1 - 50 out of 105 for search "tags:"Machine Learning""

Large-scale neural data analysis with Spark

@freeman-lab / Latest release: 0.4.1 (2014-11-27) / BSD 3-Clause / (6)

  • 3|neuroscience
  • 2|python
  • 2|machine learning


This project generalizes the Spark MLLIB K-Means clusterer to support arbitrary distance functions

@derrickburns / No release yet / (3)

  • 1|clustering
  • 1|mllib
  • 1|machine learning


Sparkling Water provides H2O algorithms inside Spark cluster

@h2oai / Latest release: 1.4.3 (2015-07-06) / Apache-2.0 / (2)

  • 1|h2o
  • 1|algorithms
  • 1|machine learning


Visualize streaming machine learning in Spark

@freeman-lab / No release yet / (1)

  • 1|streaming
  • 1|machine learning
  • 1|visualization


An implement of Factorization Machines (LibFM)

@zhengruifeng / No release yet / (0)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Package adding dropout regularization to Apache Spark MLlib project

@rakeshchalasani / No release yet / (1)

  • 1|machine learning
  • 1|mllib
  • 1|scala


Feature Selection framework based on Information Theory that includes: mRMR, InfoGain, JMI and other commonly used FS filters.

@sramirez / Latest release: 1.4.4 (2017-09-25) / Apache-2.0 / (8)

  • 3|feature-selection
  • 3|mllib
  • 3|machine learning


Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)

@sramirez / Latest release: 1.4.1 (2017-09-25) / Apache-2.0 / (7)

  • 2|discretization
  • 2|mllib
  • 1|machine learning


Coursera Machine Learning class examples in Spark

@zinniasystems / No release yet / (0)

  • 2|ml
  • 2|machine learning
  • 1|example


Using JPMML Evaluator to validate the PMML models exported from Spark

@selvinsource / No release yet / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


PySpark + Scikit-learn = Sparkit-learn

@lensacom / No release yet / (2)

  • 1|python
  • 1|scikit-learn
  • 1|machine learning


Streaming Recommendation Engine using matrix factorization with user and product bias

@brkyvz / Latest release: 0.1.0 (2015-05-26) / Apache-2.0 / (2)

  • 1|streaming
  • 1|ml
  • 1|machine learning


Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.

@cloudml / No release yet / (2)

  • 1|ml
  • 1|mllib
  • 1|machine learning


A machine learning package built for humans.

@airbnb / No release yet / (1)

  • 1|machine learning


Distributed solver library for large-scale structured output prediction

@dalab / No release yet / (0)

  • 1|Support Vector Machine
  • 1|Structured Prediction
  • 1|machine learning


Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data/Compute Engine

@ddf-project / No release yet / (11)

  • 3|API
  • 2|tools
  • 2|machine learning


A distributed implementation of AdaBoost.MH and MP-Boost using Apache Spark

@tizfa / Latest release: 0.6 (2015-07-01) / Apache-2.0 / (0)

  • 1|adaboost
  • 1|classification
  • 1|machine learning


A Hivemall wrapper for Spark

@maropu / Latest release: 0.0.6 (2016-04-07) / Apache-2.0 / (0)

  • 1|sql
  • 1|hive
  • 1|machine learning


Highly Scalable Grid-Density Clustering Algorithm for Spark MLLib

@thomastriplet / No release yet / (0)

  • 1|clustering
  • 1|spark
  • 1|machine learning


Alternative to Spark machine learning pipeline feature extractors, focused on building sparse feature vectors.

@collectivemedia / No release yet / (1)

  • 2|feature extraction
  • 2|machine learning


Spark algorithms for building and processing k-nn graphs

@tdebatty / Latest release: 0.13 (2016-02-17) / MIT / (1)

  • 1|graph
  • 1|machine learning


Distributed t-SNE via Apache Spark

@saurfang / No release yet / (1)

  • 1|machine learning


Machine Learning over Twitter's stream. Using Apache Spark, Web Server and Lightning Graph server.

@giorgioinf / Latest release: 0.2.0 (2016-06-19) / GPL-3.0 / (0)

  • 1|ml
  • 1|example
  • 1|streaming


A Stanford CoreNLP wrapper for Apache Spark

@databricks / Latest release: 0.4.0-spark2.4-scala2.11 (2018-11-16) / GPL-3.0 / (2)

  • 2|NLP
  • 2|machine learning
  • 1|NER


TFOCS for Spark, a Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)

@databricks / No release yet / (1)

  • 1|machine learning
  • 1|optimization
  • 1|convex


This is a prototype implementation of Bisecting K-Means Clustering on Spark.

@yu-iskw / Latest release: 0.1.1 (2015-08-28) / Apache-2.0 / (0)

  • 1|clustering
  • 1|machine learning
  • 1|scala


DistML provide a supplement to mllib to support model-parallel on Spark

@intel-machine-learning / No release yet / (1)

  • 1|parameter server
  • 1|machine learning


Deep Learning for Spark ML

@deeplearning4j / Latest release: 0.4-rc3.4 (2015-10-02) / Apache-2.0 / (1)

  • 1|Spark ML
  • 1|machine learning


Implementation of Random Ferns for Apache Spark

@CeON / Latest release: 0.2.0 (2015-10-08) / Apache-2.0 / (0)

  • 3|machine learning


Distribtued Topic Modeling on Apache Spark

@intel-analytics / No release yet / (1)

  • 1|graph
  • 1|LDA
  • 1|machine learning


Linear algebra operators for Apache Spark MLlib's linalg package

@brkyvz / Latest release: 0.1.0 (2015-09-09) / Apache-2.0 / (1)

  • 1|linear algebra
  • 1|lazy
  • 1|machine learning


A Neural network implementation with Scala

@nearbydelta / No release yet / (0)

  • 1|neural network
  • 1|machine learning


Docker-based, End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark Streaming, ML, MLlib, GraphX, Kafka, Cassandra, Redis, Apache Zeppelin, Spark-Notebook, iPython/Jupyter Notebook, Tableau, H2O Flow, Tachyon,

@fluxcapacitor / No release yet / (3)

  • 2|streaming
  • 2|kafka
  • 1|machine learning


Implementation of Factorization Machines on Spark using parallel stochastic gradient descent (python and scala)

@blebreton / No release yet / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Spark implementation of Nearest Neighbours Mean Shift using LSH

@Kybe67 / No release yet / (1)

  • 1|lsh
  • 1|machine learning


gradient boosting tree with arbitrary user-defined loss function

@rotationsymmetry / Latest release: 0.2.1-s_2.10 (2015-11-01) / Apache-2.0 / (0)

  • 1|machine learning


Large-scale Machine Learning using Apache Spark

@project-mandolin / No release yet / (0)

  • 1|machine learning


Estus Scientific Library

@EstusDev / No release yet / (0)

  • 1|machine learning


k-Nearest Neighbors algorithm on Spark

@saurfang / Latest release: 0.3.0 (2020-02-06) / Apache-2.0 / (1)

  • 2|ml
  • 2|machine learning


Popular ML Datasets for Spark ML (MNIST, IRIS, CIFAR)

@cookieai / Latest release: 0.1.0 (2015-12-22) / Apache-2.0 / (0)

  • 1|data source
  • 1|machine learning


dllib is a deep learning module running on Apache Spark.

@Lewuathe / Latest release: 0.0.9 (2017-01-10) / Apache-2.0 / (1)

  • 1|deep learning
  • 1|machine learning


Assembly of fundamental statistics implemented based on Apache Spark

@hhbyyh / No release yet / (0)

  • 1|statistics
  • 1|machine learning


Assess binary classifier calibration (i.e., how well classifier outputs match observed class proportions) in Spark

@robert-dodier / No release yet / (0)

  • 1|machine learning


Scikit-learn integration package for Apache Spark

@databricks / Latest release: 0.2.3 (2017-09-29) / BSD 3-Clause / (1)

  • 2|machine learning
  • 1|ml
  • 1|scikit-learn


A Distributed Evolutionary Multivariate Discretizer (DEMD)

@sramirez / Latest release: 1.0 (2016-02-04) / Apache-2.0 / (2)

  • 1|discretization
  • 1|machine learning


kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data.

@JMailloH / Latest release: 3.0 (2016-07-12) / Apache-2.0 / (4)

  • 2|ml
  • 2|mllib
  • 2|machine learning


PCARD ensemble method. Ensemble of decision trees based on Random Discretization and Principal Components Analysis.

@djgg / Latest release: 1.3 (2018-04-05) / Apache-2.0 / (3)

  • 1|machine learning
  • 1|ensemble
  • 1|mllib


Approximate nearest neighbor search using locality-sensitive hashing

@karlhigley / Latest release: 0.2.2 (2016-07-05) / MIT / (0)

  • 1|lsh
  • 1|machine learning


Scalable deep learning running Caffe inside Spark executors with peer-to-peer communication

@yahoo / No release yet / (1)

  • 1|deep learning
  • 1|caffe
  • 1|machine learning


Spark MLlib wrapper around Snowball stemming

@master / Latest release: 0.2.1 (2018-11-28) / BSD 2-Clause / (0)

  • 1|machine learning