smote-bd

smote-bd (homepage)

SMOTE-BD: A distributed Synthetic Minority Oversampling Technique (SMOTE) for Big Data.

It is a fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighbuorhood of each example of the minority class.

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages majobasgall:smote-bd:0.1

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "majobasgall/smote-bd:0.1"

Otherwise,

resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/"

libraryDependencies += "majobasgall" % "smote-bd" % "0.1"

Maven

In your pom.xml, add:

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>majobasgall</groupId>
    <artifactId>smote-bd</artifactId>
    <version>0.1</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>https://repos.spark-packages.org/</url>
  </repository>
</repositories>

Releases

Version: 0.1 ( 998700 | zip | jar ) / Date: 2018-11-14 / License: Apache-2.0 / Scala version: 2.11