rumble (homepage)

Rumble: JSONiq for Apache Spark

@RumbleDB / (1)

Rumble is a JSONiq engine that runs on top of Spark.
It makes it easy to query datasets when they are highly nested and difficult to manipulate with Spark SQL, and/or when they are heterogeneous and do not fit in DataFrames. Rumble can read JSON, text, CSV, Parquet, Avro, etc wherever they are: your local disk, S3, HDFS, etc.
It exposes the high-level and easy-to-learn JSONiq data model (based on sequences of items that scale to billions) and completely hides how it uses Spark. It optimizes the query automagically under the hood.


Tags

  • 1|nosql
  • 1|tools
  • 1|Applications
  • 1|json
  • 1|JSONiq
  • 1|Nested data
  • 1|Heterogeneous data
  • 1|Data preparation

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.

Releases

No releases yet.