GMM (homepage)
Gaussian Mixture Model Implementation in Pyspark
@FlytxtRnD / (5)
GMM algorithm models the entire data set as a finite mixture of Gaussian distributions,each parameterized by a mean vector, a covariance matrix and a mixture weights. Here the probability of each point to belong to each cluster is computed along with the cluster statistics. This distributed implementation of GMM in pyspark estimates the parameters using the Expectation-Maximization algorithm and considers only diagonal covariance matrix for each component.
Tags
How to
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages FlytxtRnD:GMM:0.1
sbt
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "FlytxtRnD/GMM:0.1"
Otherwise,
resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/" libraryDependencies += "FlytxtRnD" % "GMM" % "0.1"
Maven
In your pom.xml, add:<dependencies> <!-- list of dependencies --> <dependency> <groupId>FlytxtRnD</groupId> <artifactId>GMM</artifactId> <version>0.1</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>https://repos.spark-packages.org/</url> </repository> </repositories>
Releases
Version: 0.1 ( c76463 | zip | jar ) / Date: 2015-04-07 / License: EPL-1.0
Version: v0.1 ( c76463 | zip ) / Date: 2014-11-27 / License: BSD 3-Clause