spark-neighbors (homepage)

Approximate nearest neighbor search using locality-sensitive hashing

@karlhigley / (0)

Batch computation of the nearest neighbors for each point in a dataset using:
- Hamming distance via bit sampling LSH
- Cosine distance via sign-random-projection LSH
- Euclidean distance via scalar-random-projection LSH
- Jaccard distance via Minhash LSH


Tags

  • 1|machine learning
  • 1|lsh

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages com.github.karlhigley:spark-neighbors_2.10:0.2.2

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "karlhigley/spark-neighbors:0.2.2"

Otherwise,

libraryDependencies += "com.github.karlhigley" % "spark-neighbors_2.10" % "0.2.2"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>com.github.karlhigley</groupId>
    <artifactId>spark-neighbors_2.10</artifactId>
    <version>0.2.2</version>
  </dependency>
</dependencies>

Releases

Version: 0.2.2 ( 39bdb6 | zip | jar ) / Date: 2016-07-05 / License: MIT / Scala version: 2.10

Version: 0.2.1 ( d3c62d | zip | jar ) / Date: 2016-06-27 / License: MIT / Scala version: 2.10

Version: 0.2.0 ( 1f6d61 | zip | jar ) / Date: 2016-06-27 / License: MIT / Scala version: 2.10

Version: 0.1.0 ( 06de46 | zip | jar ) / Date: 2016-02-23 / License: MIT / Scala version: 2.10