HS_FkNN: Hybrid Spill Tree Fuzzy k Nearest Neighbors.

This is an open-source Spark package about two Fuzzy k Nearest Neighbors classifier based on Apache Spark. We take advantage of its in-memory operations to simultaneously classify big amounts of unseen cases against a big training dataset. It consists of two stages: class membership degree and classification. The class membership degree stage changes the class label to class membership degree vector. Two approaches are proposed to address this stage: Local Hybrid Spill Tree FkNN (LHS_FkNN) and Global Approximate Hybrid Spill Tree FkNN (GAHS-FkNN). LHS_FkNN follows a local approach, which calculates membership independently on each partition. GAHS-FkNN has a global approach based on Hybrid Spill Tree, considering all data from the training set. The classification stage is common to both models and is based on Hybrid Spill Tree.


  • 1|ml
  • 1|machine learning
  • 1|mllib
  • 1|classification
  • 1|kNN
  • 1|Fuzzy kNN

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages JMailloH:HS_FkNN:1.0


If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "JMailloH/HS_FkNN:1.0"


resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "JMailloH" % "HS_FkNN" % "1.0"


In your pom.xml, add:
  <!-- list of dependencies -->
  <!-- list of other repositories -->


Version: 1.0 ( 5621ac | zip | jar ) / Date: 2018-12-30 / License: Apache-2.0 / Scala version: 2.11