SmartFiltering (homepage)

Smart Filtering framework for Big Data

@djgarcia / (2)

This framework implements four distance based Big Data preprocessing algorithms to remove noisy examples: ENN_BD, AllKNN_BD, NCNEdit_BD and RNG_BD filters, with special emphasis in their scalability and performance traits.


  • 1|ml
  • 1|machine learning
  • 1|mllib
  • 1|kNN
  • 1|noise
  • 1|smart data
  • 1|noise filter

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages djgarcia:SmartFiltering:1.0


If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "djgarcia/SmartFiltering:1.0"


resolvers += "Spark Packages Repo" at ""

libraryDependencies += "djgarcia" % "SmartFiltering" % "1.0"


In your pom.xml, add:
  <!-- list of dependencies -->
  <!-- list of other repositories -->


Version: 1.0 ( 4f6037 | zip | jar ) / Date: 2018-04-09 / License: Apache-2.0 / Scala version: 2.11