yggdrasil (homepage)

Yggdrasil: Faster Decision Trees Using Column Partitioning in Spark

@fabuzaid21 / (1)

Yggdrasil is a more efficient way in Apache Spark to train decision trees for large depths and datasets with a high number of features. For depths greater than 10, Yggdrasil is an order of magnitude faster than Spark MLlib v1.6.0.


  • 1|machine learning

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages fabuzaid21:yggdrasil:1.0.1


If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "fabuzaid21/yggdrasil:1.0.1"


resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/"

libraryDependencies += "fabuzaid21" % "yggdrasil" % "1.0.1"


In your pom.xml, add:
  <!-- list of dependencies -->
  <!-- list of other repositories -->


Version: 1.0.1 ( 5de595 | zip | jar ) / Date: 2018-05-11 / License: Apache-2.0 / Scala version: 2.10

Version: 1.0 ( f2bf92 | zip | jar ) / Date: 2016-06-07 / License: Apache-2.0 / Scala version: 2.10