sparkxgboost (homepage)

gradient boosting tree with arbitrary user-defined loss function

SparkXGBoost is a Spark implementation of gradient boosting tree using 2nd order approximation of arbitrary user-defined loss function. SparkXGBoost is inspired by the XGBoost project.
SparkXGBoost ships with The following Loss classes:
* SquareLoss for linear (normal) regression
* LogisticLoss for binary classification
* PoissonLoss for Poisson regression of count data 
To avoid overfitting, SparkXGBoost employs the following regularization methods:
* Shrinkage by learning rate (or step size)
* L2 regularization term on node
* L1 regularization term on node
* Stochastic gradient boosting (similar to Bagging)
* Feature sub sampling for learning nodes 
SparkXGBoost is capable of processing multiple learning nodes in the one pass of the training data to improve efficiency.
Thank you for testing the early release. Please feel free to provide your feedback or bug reports via GitHub Issues.


Tags

  • 1|machine learning

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages rotationsymmetry:sparkxgboost:0.2.1-s_2.10

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "rotationsymmetry/sparkxgboost:0.2.1-s_2.10"

Otherwise,

resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/"

libraryDependencies += "rotationsymmetry" % "sparkxgboost" % "0.2.1-s_2.10"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>rotationsymmetry</groupId>
    <artifactId>sparkxgboost</artifactId>
    <version>0.2.1-s_2.10</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>https://repos.spark-packages.org/</url>
  </repository>
</repositories>

Releases

Version: 0.2.1-s_2.10 ( 697d20 | zip | jar ) / Date: 2015-11-01 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: - 34% , - 100% , - 35% , - 37% , - 34% , - 41%

Version: 0.2.0-s_2.10 ( c79824 | zip | jar ) / Date: 2015-10-30 / License: Apache-2.0 / Scala version: 2.10

Spark Scala/Java API compatibility: - 35% , - 37% , - 34% , - 41% , - 34% , - 100%