A Distributed Evolutionary Multivariate Discretizer (DEMD)
@sramirez / (2)
Here, a Distributed Evolutionary Multivariate Discretizer (DEMD) for data reduction on Spark is presented. This evolutionary-based discretizer uses binary chromosome representation and a wrapper fitness function. The algorithm is aimed at optimizing the cut points selection problem by trading-off two factors: simplicity of solutions and its classification accuracy. In order to alleviate the complexity derived from the evolutionary process, the complete evaluation phase has been fully parallelized. For this purpose, both the set of chromosomes and instances are split into different partitions and a random cross-evaluation process between them is performed.
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages sramirez:spark-DEMD-discretizer:1.0
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "sramirez/spark-DEMD-discretizer:1.0"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "sramirez" % "spark-DEMD-discretizer" % "1.0"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>sramirez</groupId> <artifactId>spark-DEMD-discretizer</artifactId> <version>1.0</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>