SparkAffinityPropagation

SparkAffinityPropagation (homepage)

Affinity Propagation on Spark

Affinity Propagation (AP), a graph clustering algorithm based on the concept of "message passing" between data points. Unlike clustering algorithms such as k-means or k-medoids, AP does not require the number of clusters to be determined or estimated before running it. AP is developed by Frey and Dueck. Please refer to the paper[1].

Affinity Propagation on Spark implements Affinity Propagation algorithm on cluster computing system Spark. By leveraging computing cluster, you can run this clustering algorithm on large-scale data sets.

[1] Brendan J. Frey; Delbert Dueck (2007). "Clustering by passing messages between data points". Science. 315 (5814): 972-976.

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages viirya:SparkAffinityPropagation:1.0

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "viirya/SparkAffinityPropagation:1.0"

Otherwise,

resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/"

libraryDependencies += "viirya" % "SparkAffinityPropagation" % "1.0"

Maven

In your pom.xml, add:

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>viirya</groupId>
    <artifactId>SparkAffinityPropagation</artifactId>
    <version>1.0</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>https://repos.spark-packages.org/</url>
  </repository>
</repositories>

Releases

Version: 1.0 ( 290dde | zip | jar ) / Date: 2017-07-29 / License: MIT / Scala version: 2.10