patchwork (homepage)

Highly Scalable Grid-Density Clustering Algorithm for Spark MLLib

PatchWork is a novel highly-scalable grid-density clustering algorithm to address those issues. It has linear complexity and near linear horizontal scalability. As a result, PatchWork can cluster a billion points in a few minutes only, a 40x improvement over Spark MLLib native implementation of the well-known K-Means.


  • 1|machine learning
  • 1|spark
  • 1|clustering

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.


No releases yet.