prep-buddy (homepage)

A Scala / Java / Python library for cleansing, transforming and preparing large datasets for ML operations on Apache Spark.

A scalable, high-performance spark-library that address an array of data processing concerns such as data quality assurance, data preparation for machine learning and other various data related tasks.
The library is built on top of Apache Spark (fast and general-purpose cluster computing system).It is scalable which can handle lot amount of data and can be used in a pipeline of data cleaning as well.
 It has API’s in three different languages Scala, Java and python.


  • 1|data-cleaning
  • 1|data-sanitization
  • 1|transforming
  • 1|data-prepration

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.


No releases yet.