Highly Scalable Grid-Density Clustering Algorithm for Spark MLLib

PatchWork is a novel highly-scalable grid-density clustering algorithm to address those issues. It has linear complexity and near linear horizontal scalability. As a result, PatchWork can cluster a billion points in a few minutes only, a 40x improvement over Spark MLLib native implementation of the well-known K-Means.


