Smart_Imputation (homepage)
Smart Imputation. k Nearest Neighbor Imputation methods
@JMailloH / (2)
This contribution implements two approaches of the k Nearest Neighbor Imputation focused on the scalability in order to handle big dataset. k Nearest Neighbor - Local Imputation and k Nearest Neighbor Imputation - Global Imputation. The global proposal takes into account all the instances to calculate the k nearest neighbors. The local proposal considers those that are into the same partition, achieving higher times, but losing the information because it does not consider all the samples.
Tags
How to
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages JMailloH:Smart_Imputation:1.0
sbt
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "JMailloH/Smart_Imputation:1.0"
Otherwise,
resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/" libraryDependencies += "JMailloH" % "Smart_Imputation" % "1.0"
Maven
In your pom.xml, add:<dependencies> <!-- list of dependencies --> <dependency> <groupId>JMailloH</groupId> <artifactId>Smart_Imputation</artifactId> <version>1.0</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>https://repos.spark-packages.org/</url> </repository> </repositories>
Releases
Version: 1.0 ( 81c686 | zip | jar ) / Date: 2018-04-11 / License: Apache-2.0 / Scala version: 2.11