spark-corenlp (homepage)

A StanfordĀ CoreNLP wrapper for Apache Spark

@databricks / (2)

Spark-CoreNLP wraps Stanford CoreNLP annotation pipeline as a Transformer under the ML pipeline API. It reads a string column representing documents, and applies CoreNLP annotators to each document. The output column contains annotations from CoreNLP.


Tags

  • 2|machine learning
  • 2|NLP
  • 1|NER
  • 1|POS

How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages databricks:spark-corenlp:0.2.0-s_2.10

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "databricks/spark-corenlp:0.2.0-s_2.10"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "databricks" % "spark-corenlp" % "0.2.0-s_2.10"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>databricks</groupId>
    <artifactId>spark-corenlp</artifactId>
    <version>0.2.0-s_2.10</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Releases

Version: 0.2.0-s_2.10 ( 68e907 | zip | jar ) / Date: 2016-08-29 / License: GPL-3.0 / Scala version: 2.10

Version: 0.2.0-s_2.11 ( 68e907 | zip | jar ) / Date: 2016-08-29 / License: GPL-3.0 / Scala version: 2.11

Version: 0.1 ( c7a789 | zip | jar ) / Date: 2016-06-28 / License: GPL-3.0 / Scala version: 2.10