Spark Packages

A community index of third-party packages for Apache Spark.

Showing packages 1 - 36 out of 36 for search "tags:"Tools""

spark-hbase-connector

Connect Spark to HBase for reading and writing data with ease

@nerdammer / Latest release: 1.0.3 (2016-04-20) / Apache-2.0 / (3)

spark-testing-base

Base classes to use when writing tests with Spark

@holdenk / Latest release: 2.2.2_0.11.0 (2018-12-23) / Apache-2.0 / (10)

sbt-spark-package

Sbt plugin for Spark packages

@databricks / Latest release: 0.2.4 (2016-07-15) / Apache-2.0 / (3)

spark-package-cmd-tool

A command line tool for Spark packages

@databricks / Latest release: 0.3.0 (2015-03-17) / Apache-2.0 / (1)

docker-spark

Docker container for spark standalone cluster.

@epahomov / No release yet / (0)

spark-archetype-scala

Maven archetype used to bootstrap a Spark Scala project

@mbonaci / Latest release: 0.9 (2015-04-24) / MIT / (0)

sbt-spark-ec2

SBT plugin for spark-ec2

@pishen / No release yet / (0)

spark-sas7bdat

Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL

@saurfang / Latest release: 3.0.0-s_2.12 (2020-09-13) / Apache-2.0 / (1)

DDF

Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data/Compute Engine

@ddf-project / No release yet / (11)

spark-deployer

Deploy Spark cluster in an easy way.

@pishen / Latest release: 0.5.1 (2015-06-25) / Apache-2.0 / (0)

sbt-spark-submit

sbt plugin for spark-submit

@saurfang / No release yet / (0)

pipeline

Docker-based, End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark Streaming, ML, MLlib, GraphX, Kafka, Cassandra, Redis, Apache Zeppelin, Spark-Notebook, iPython/Jupyter Notebook, Tableau, H2O Flow, Tachyon,

@fluxcapacitor / No release yet / (3)

soda

Solr Dictionary Annotator (Microservice for Spark)

@elsevierlabs-os / No release yet / (0)

sparkplug

Create composable data processing pipelines in Spark, and execute them on a cluster using simple Scala code

@springnz / No release yet / (0)

SparkRext

SparkR extension for dplyr

@saurfang / No release yet / (0)

sscheck

ScalaCheck for Spark

@juanrh / No release yet / (0)

flintrock

A command-line tool for launching Apache Spark clusters.

@nchammas / No release yet / (1)

ggplot2.SparkR

Rebooting ggplot2 for scalable big data visualization

@SKKU-SKT / No release yet / (3)

spark-compaction

Spark tool to handle file compaction.

@KeithSSmith / Latest release: 1.0.0 (2016-04-22) / Apache-2.0 / (0)

GPUEnabler

Provides GPU awareness to Spark

@ibmsoe / No release yet / (1)

DataScienceTools

Some tools for outliers detection, discretisation, correlation analysis and text correction.

@hupi-analytics / No release yet / (3)

spark-df-profiling

Create HTML profiling reports from Apache Spark DataFrames

@julioasotodv / Latest release: 1.1.2 (2016-07-26) / Apache-2.0 / (1)

baryon

Baryon is a library for building Spark Streaming applications that consume data from Kafka.

@groupon / Latest release: 1.0 (2016-07-29) / BSD 3-Clause / (0)

mezzanine

Mezzanine is a library built on Spark Streaming used to consume data from Kafka and store it into Hadoop.

@groupon / Latest release: 1.0 (2016-07-29) / BSD 3-Clause / (0)

proxy4sparkui

Configure nginx on the master node for a reverse proxy to Apache Spark web UI and history server. No more ssh socks/tunnel.

@ekasitk / No release yet / (0)

spark-openstack

Openstack Spark cluster deployment

@ispras / Latest release: 0.9.5 (2016-11-10) / Apache-2.0 / (0)

parquet-index

Spark SQL index for Parquet tables

@lightcopy / Latest release: 0.5.0-s_2.12 (2020-08-01) / Apache-2.0 / (1)

DIQL

DIQL: A Data Intensive Query Language for Apache Spark

@fegaras / No release yet / (0)

mmlspark

Microsoft Machine Learning for Apache Spark

@Azure / Latest release: 0.17 (2019-04-23) / MIT / (4)

Optimus

Optimus is the missing library for cleansing (cleaning and much more) and pre-processing data in a distributed fashion with Apache Spark.

@ironmussa / Latest release: 1.1.0 (2017-10-25) / Apache-2.0 / (2)

ParallelTool

Tool design to speed up spark applications

@marino-serna / Latest release: 1.0.1-00 (2018-07-22) / Apache-2.0 / (1)

spark-utils

Basic framework utilities to quickly start writing production ready Apache Spark applications

@tupol / Latest release: 0.6.1 (2021-10-18) / MIT / (0)

spark-tools

Executable Apache Spark Tools: Format Converter & SQL Processor

@tupol / Latest release: 0.4.1-s_2.11 (2020-09-12) / MIT / (0)

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

@archivesunleashed / Latest release: 0.18.0 (2019-08-21) / Apache-2.0 / (0)

twut

An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.

@archivesunleashed / No release yet / (0)

rumble

Rumble: JSONiq for Apache Spark

@RumbleDB / No release yet / (1)