Natural Language Processing Library for Apache Spark.
@JohnSnowLabs / (5)
Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports state-of-the-art transformers such as BERT, XLNet, ELMO, ALBERT, and Universal Sentence Encoder that can be used seamlessly in a cluster. It also offers Tokenization, Word Segmentation, Part-of-Speech Tagging, Named Entity Recognition, Dependency Parsing, Spell Checking, Multi-class Text Classification, Multi-class Sentiment Analysis, Machine Translation (+180 languages), Summarization and Question Answering (Google T5), and many more NLP tasks.
NOTE: We no longer publish Spark NLP in spark-packages, please check the main repo (https://github.com/JohnSnowLabs/spark-nlp) or Maven repository (https://mvnrepository.com/artifact/com.johnsnowlabs.nlp).
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.0.1
In your sbt build file, add:
libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp-gpu_2.12" % "3.0.1"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>com.johnsnowlabs.nlp</groupId> <artifactId>spark-nlp-gpu_2.12</artifactId> <version>3.0.1</version> </dependency> </dependencies>