smote-bd (homepage)

SMOTE-BD: A distributed Synthetic Minority Oversampling Technique (SMOTE) for Big Data.

@majobasgall / (0)

It is a fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighbuorhood of each example of the minority class.


  • 1|big data
  • 1|Preprocessing
  • 1|imbalanced
  • 1|smote

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages majobasgall:smote-bd:0.1


If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "majobasgall/smote-bd:0.1"


resolvers += "Spark Packages Repo" at ""

libraryDependencies += "majobasgall" % "smote-bd" % "0.1"


In your pom.xml, add:
  <!-- list of dependencies -->
  <!-- list of other repositories -->


Version: 0.1 ( 998700 | zip | jar ) / Date: 2018-11-14 / License: Apache-2.0 / Scala version: 2.11