generalized-kmeans-clustering (homepage)

This project generalizes the Spark MLLIB K-Means clusterer to support arbitrary distance functions

This project distance decouples the metric from the clusterer implementation, allowing the end-user the opportunity to define a custom distance function in just a few lines of code. We demonstrate this by implementing several Bregman divergences, including the squared Euclidean distance, the Kullback-Leibler divergence, the logistic loss divergence, the Itakura-Saito divergence, and the generalized I-divergence. We also implement a distance function that is a symmetric version of the Kullback-Leibler divergence that is also a metric. Pull requests offering additional distance functions ( are welcome.


  • 1|machine learning
  • 1|clustering
  • 1|mllib

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.


No releases yet.