pyspark-csv (homepage)

An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parses csv data into SchemaRDD. No installation required, simply include pyspark_csv.py via SparkContext.

Supports type inference by evaluating data within each column. In the case of column having multiple data types, pyspark-csv will assign the lowest common denominator type for that column. Type inference also gracefully handles null values (e.g., None, ? and '')


Tags

  • 2|python
  • 2|csv
  • 1|sql
  • 1|spark
  • 1|data source

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.

Releases

No releases yet.