spark-sklearn (homepage)

Scikit-learn integration package for Apache Spark

@databricks / (1)

This package contains some tools to integrate the Spark computing framework with the popular scikit-learn machine library. Among other tools: 1) train and evaluate multiple scikit-learn models in parallel. It is a distributed analog to the multicore implementation included by default in scikit-learn. 2) convert Spark's Dataframes seamlessly into numpy ndarrays or sparse matrices. 3) (experimental) distribute Scipy's sparse matrices as a dataset of sparse vectors.


Tags

  • 2|machine learning
  • 1|ml
  • 1|scikit-learn

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages databricks:spark-sklearn:0.2.3

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "databricks/spark-sklearn:0.2.3"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "databricks" % "spark-sklearn" % "0.2.3"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>databricks</groupId>
    <artifactId>spark-sklearn</artifactId>
    <version>0.2.3</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Releases

Version: 0.2.3 ( cb6e91 | zip | jar ) / Date: 2017-09-29 / License: BSD 3-Clause / Scala version: 2.10

Version: 0.2.2 ( b78196 | zip | jar ) / Date: 2017-09-29 / License: BSD 3-Clause / Scala version: 2.10

Version: 0.2.0 ( 101a95 | zip | jar ) / Date: 2016-08-30 / License: BSD 3-Clause / Scala version: 2.10

Version: 0.1.1 ( 6af376 | zip ) / Date: 2016-02-24 / License: BSD 3-Clause