hail

hail (homepage)

Explore and analyze genomic data.

Hail is an open-source, scalable framework for exploring and analyzing genomic data. Starting from sequencing or microarray data in VCF or other formats, Hail can, for example:

- generate variant annotations like call rate, Hardy-Weinberg equilibrium p-value, and population-specific allele count
- generate sample annotations like mean depth, imputed sex, and TiTv ratio
- load variant and sample annotations from text tables, JSON, VCF, VEP, and locus interval files
generate new annotations from existing annotations and the genotypes, and use these to filter samples, variants, and genotypes
- find Mendelian violations in trios, prune variants in linkage disequilibrium, analyze genetic similarity between samples via the GRM and IBD matrix, and compute sample scores and variant loadings using PCA
- perform variant, gene-burden and eQTL association analyses using linear, logistic, and linear mixed regression, and estimate heritability

All this functionality is exposed through Python and backed by distributed algorithms built on top of Apache Spark to efficiently analyze gigabyte-scale data on a laptop or terabyte-scale data on an on-prem cluster or in the cloud.

Tags (No tags yet, login to add one. )

How to

This package doesn't have any releases published in the Spark Packages repo, or with maven coordinates supplied. You may have to build this package from source, or it may simply be a script. To use this Spark Package, please follow the instructions in the README.

Releases

No releases yet.