k-means-pipeline (homepage)

An ML pipeline to cluster DataFrames with categorical values using K-Means

@knoldus / (1)

This library provides KMeansPipeline object which is used to cluster data even with categorical fields using K Means Clustering algorithm in Spark MLLib


Tags

  • 1|ml
  • 1|etl
  • 1|kmeans

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages knoldus:k-means-pipeline:0.0.1

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "knoldus/k-means-pipeline:0.0.1"

Otherwise,

resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/"

libraryDependencies += "knoldus" % "k-means-pipeline" % "0.0.1"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>knoldus</groupId>
    <artifactId>k-means-pipeline</artifactId>
    <version>0.0.1</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>https://repos.spark-packages.org/</url>
  </repository>
</repositories>

Releases

Version: 0.0.1 ( 7aeefa | zip | jar ) / Date: 2016-07-30 / License: Apache-2.0 / Scala version: 2.10