DataScienceTools (homepage)

Some tools for outliers detection, discretisation, correlation analysis and text correction.

This package provided by HUPI ( ) has for the moment 4 tools : 

- Outliers detection : We use some statistical test and/or some hypothesis to determine if there is outliers in the RDD. Then there is a function to replace this points with some features like mean, mediane, nearest quantile, nearest "maximum", ... 
- Discretisation : This function allows you to discretized a quantitative variable with many options. 

- Correlation : There is a lot of function on dataframe which compute some features about correlation. 
- Text correction : With this function you can directly correct your text, it mostly removes typing mistakes.


  • 1|spark
  • 1|scala
  • 1|tools
  • 1|RDD
  • 1|DataFrame
  • 1|outliers
  • 1|discretisation
  • 1|correlation
  • 1|text correction

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.


No releases yet.