GMM (homepage)

Gaussian Mixture Model Implementation in Pyspark

@FlytxtRnD / (5)

GMM algorithm models the entire data set as a finite mixture of Gaussian distributions,each parameterized by a mean vector, a covariance matrix and a mixture weights. Here the probability of each point to belong to each cluster is computed along with the cluster statistics. This distributed implementation of GMM in pyspark estimates the parameters using the Expectation-Maximization algorithm and considers only diagonal covariance matrix for each component.


  • 1|python
  • 1|mllib

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages FlytxtRnD:GMM:0.1


If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "FlytxtRnD/GMM:0.1"


resolvers += "Spark Packages Repo" at ""

libraryDependencies += "FlytxtRnD" % "GMM" % "0.1"


In your pom.xml, add:
  <!-- list of dependencies -->
  <!-- list of other repositories -->


Version: 0.1 ( c76463 | zip | jar ) / Date: 2015-04-07 / License: EPL-1.0

Version: v0.1 ( c76463 | zip ) / Date: 2014-11-27 / License: BSD 3-Clause