Smart Imputation. k Nearest Neighbor Imputation methods

This contribution implements two approaches of the k Nearest Neighbor Imputation focused on the scalability in order to handle big dataset. k Nearest Neighbor - Local Imputation and k Nearest Neighbor Imputation - Global Imputation. The global proposal takes into account all the instances to calculate the k nearest neighbors. The local proposal considers those that are into the same partition, achieving higher times, but losing the information because it does not consider all the samples.


  • 1|ml
  • 1|machine learning
  • 1|mllib
  • 1|kNN
  • 1|missing values
  • 1|imputation
  • 1|smart data

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages JMailloH:Smart_Imputation:1.0


If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "JMailloH/Smart_Imputation:1.0"


resolvers += "Spark Packages Repo" at ""

libraryDependencies += "JMailloH" % "Smart_Imputation" % "1.0"


In your pom.xml, add:
  <!-- list of dependencies -->
  <!-- list of other repositories -->


Version: 1.0 ( 81c686 | zip | jar ) / Date: 2018-04-11 / License: Apache-2.0 / Scala version: 2.11