gradient boosting tree with arbitrary user-defined loss function
@rotationsymmetry / (0)
SparkXGBoost is a Spark implementation of gradient boosting tree using 2nd order approximation of arbitrary user-defined loss function. SparkXGBoost is inspired by the XGBoost project.
SparkXGBoost ships with The following Loss classes:
* SquareLoss for linear (normal) regression
* LogisticLoss for binary classification
* PoissonLoss for Poisson regression of count data
To avoid overfitting, SparkXGBoost employs the following regularization methods:
* Shrinkage by learning rate (or step size)
* L2 regularization term on node
* L1 regularization term on node
* Stochastic gradient boosting (similar to Bagging)
* Feature sub sampling for learning nodes
SparkXGBoost is capable of processing multiple learning nodes in the one pass of the training data to improve efficiency.
Thank you for testing the early release. Please feel free to provide your feedback or bug reports via GitHub Issues.
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages rotationsymmetry:sparkxgboost:0.2.1-s_2.10
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "rotationsymmetry/sparkxgboost:0.2.1-s_2.10"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "rotationsymmetry" % "sparkxgboost" % "0.2.1-s_2.10"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>rotationsymmetry</groupId> <artifactId>sparkxgboost</artifactId> <version>0.2.1-s_2.10</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>