spark-sas7bdat (homepage)

Remove the splittable part

A library for parsing SAS data (sas7bdat) with Spark SQL. This also includes a SasInputFormat designed for Hadoop mapreduce. This format is splittable when input is uncompressed thus can achieve high parallelism for a large SAS file.

This library is inspired by spark-csv and currently uses parso for parsing as it is the only public available parser that handles both forms of SAS compression (CHAR and BINARY). Note parso is licensed under GPL-3 and subsequently this library is also licensed as such.


Tags (No tags yet, login to add one. )


How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages chhokarpardeep:spark-sas7bdat:1.1.7-s_2.11

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "chhokarpardeep/spark-sas7bdat:1.1.7-s_2.11"

Otherwise,

resolvers += "Spark Packages Repo" at "https://repos.spark-packages.org/"

libraryDependencies += "chhokarpardeep" % "spark-sas7bdat" % "1.1.7-s_2.11"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>chhokarpardeep</groupId>
    <artifactId>spark-sas7bdat</artifactId>
    <version>1.1.7-s_2.11</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>https://repos.spark-packages.org/</url>
  </repository>
</repositories>

Releases

Version: 1.1.7-s_2.11 ( cb4011 | zip | jar ) / Date: 2017-04-04 / License: GPL-3.0 / Scala version: 2.11

Version: 1.1.6-s_2.11 ( cb4011 | zip | jar ) / Date: 2017-04-03 / License: GPL-3.0 / Scala version: 2.11

Version: 1.1.5-s_2.11 ( cb4011 | zip | jar ) / Date: 2017-04-03 / License: GPL-3.0 / Scala version: 2.11