Alternative to Spark machine learning pipeline feature extractors, focused on building sparse feature vectors.
@collectivemedia / (1)
Model Matrix is a framework/tool for solving large scale feature engineering problem: building model features for machine learning with high feature sparsity.
It’s build on top Spark DataFrames and can read input data, and write ‘featurized’ from/to HDFS (CSV, Parquet) and Hive.
This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.
No releases yet.