spark-knn (homepage)

k-Nearest Neighbors algorithm on Spark

@saurfang / (1)

k-Nearest Neighbors algorithm (k-NN) implemented on Apache Spark. This uses a hybrid spill tree approach to achieve high accuracy and search efficiency. It scales very well both horizontally and in terms of number of observations/dimensions.


Tags

  • 2|ml
  • 2|machine learning

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages saurfang:spark-knn:0.3.0

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "saurfang/spark-knn:0.3.0"

Otherwise,

resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/"

libraryDependencies += "saurfang" % "spark-knn" % "0.3.0"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>saurfang</groupId>
    <artifactId>spark-knn</artifactId>
    <version>0.3.0</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>https://repos.spark-packages.org/</url>
  </repository>
</repositories>

Releases

Version: 0.3.0 ( 0ae87b | zip | jar ) / Date: 2020-02-06 / License: Apache-2.0 / Scala version: 2.11

Version: 0.2.0 ( b8967d | zip | jar ) / Date: 2017-02-07 / License: Apache-2.0 / Scala version: 2.11

Version: 0.1.1 ( 18ca06 | zip | jar ) / Date: 2016-09-05 / License: Apache-2.0 / Scala version: 2.10

Version: 0.0.1-18ca06f5caebe998ec8e0fbbbc7233ce5dc37776 ( 18ca06 | zip | jar ) / Date: 2016-09-05 / License: Apache-2.0 / Scala version: 2.10

Version: 0.1.0 ( 6f274d | zip | jar ) / Date: 2015-11-23 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: - 51% , - 100% , - 7% , - 57% , - 26% , - 82%