gor-spark (homepage)

Relational query engine that unites SparkSQL and GORpipe into a single declarative query framework.

@gorpipe / (0)

GORpipe is a tool based on a genomic ordered relational architecture and allows analysis of large sets of genomic and phenotypic tabular data using a declarative query language, in a parallel execution engine. It is very efficient in a wide range of use-cases, including genome wide batch analysis, range-queries, genomic table joins of variants and segments, filtering, aggregation etc. The query language combines ideas from SQL and Unix shell pipe syntax, supporting seek-able nested queries, materialized views, and a rich set of commands and functions. For more information see the paper in Bioinformatics (https://dx.doi.org/10.1093%2Fbioinformatics%2Fbtw199).


Tags (No tags yet, login to add one. )


How to

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages org.gorpipe:gor-spark:3.10.2

sbt

In your sbt build file, add:

libraryDependencies += "org.gorpipe" % "gor-spark" % "3.10.2"

Maven

In your pom.xml, add:
<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>org.gorpipe</groupId>
    <artifactId>gor-spark</artifactId>
    <version>3.10.2</version>
  </dependency>
</dependencies>

Releases

Version: 3.10.2 ( 47ca7e | zip | jar ) / Date: 2021-05-09 / License: Apache-2.0