spark-word2vec (homepage)

A parallel implementation of word2vec based on Spark

@chen-lin / (1)

spark-word2vec creates vector representation of words in a text corpus. It is based on the implementation of word2vec in Spark MLlib. Several optimization techniques are used to make this algorithm more scalable and accurate.
Features
1. Two models CBOW and Skip-gram are used in our implementation.
2. Both hierarchical softmax and negative sampling methods are supported to train the model.
3. The sub-sampling trick can be used to achieve both faster training and significantly better representations of uncommon words.


Tags

  • 1|machine learning

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.

Releases

No releases yet.