This module provides a simple tool for anonymizing a dataset using Spark. Given a spark.sql.dataframe with relevant metadata this library generates an anonymized spark.sql.dataframe. This provides the following privacy-preserving techniques for dataset anonymization.
K Anonymity (Mondrian and Clustering Based)
L Diversity (Mondrian)
T Closeness (Mondrian)
Differential Privacy
Single User Anonymization

Use "pip install spark-privacy-preserver" to install the package. 


  • 1|anonymization
  • 1|k-anonymity
  • 1|l-diversity
  • 1|t-closeness
  • 1|differential privacy

