spark-DEMD-discretizer (homepage)

A Distributed Evolutionary Multivariate Discretizer (DEMD)

@sramirez / (2)

Here, a Distributed Evolutionary Multivariate Discretizer (DEMD) for data reduction on Spark is presented. This evolutionary-based discretizer uses binary chromosome representation and a wrapper fitness function. The algorithm is aimed at optimizing the cut points selection problem by trading-off two factors: simplicity of solutions and its classification accuracy. In order to alleviate the complexity derived from the evolutionary process, the complete evaluation phase has been fully parallelized. For this purpose, both the set of chromosomes and instances are split into different partitions and a random cross-evaluation process between them is performed. 


  • 1|machine learning
  • 1|discretization

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages sramirez:spark-DEMD-discretizer:1.0


If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "sramirez/spark-DEMD-discretizer:1.0"


resolvers += "Spark Packages Repo" at ""

libraryDependencies += "sramirez" % "spark-DEMD-discretizer" % "1.0"


In your pom.xml, add:
  <!-- list of dependencies -->
  <!-- list of other repositories -->


Version: 1.0 ( 0966a1 | zip | jar ) / Date: 2016-02-04 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: - 12% , - 58% , - 48% , - 58% , - 53% , - 58%