k-means-pipeline (homepage)
An ML pipeline to cluster DataFrames with categorical values using K-Means
@knoldus / (1)
This library provides KMeansPipeline object which is used to cluster data even with categorical fields using K Means Clustering algorithm in Spark MLLib
Tags
How to
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages knoldus:k-means-pipeline:0.0.1
sbt
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "knoldus/k-means-pipeline:0.0.1"
Otherwise,
resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/" libraryDependencies += "knoldus" % "k-means-pipeline" % "0.0.1"
Maven
In your pom.xml, add:<dependencies> <!-- list of dependencies --> <dependency> <groupId>knoldus</groupId> <artifactId>k-means-pipeline</artifactId> <version>0.0.1</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>https://repos.spark-packages.org/</url> </repository> </repositories>
Releases
Version: 0.0.1 ( 7aeefa | zip | jar ) / Date: 2016-07-30 / License: Apache-2.0 / Scala version: 2.10