A community index of third-party packages for Apache Spark.

Showing packages 1 - 36 out of 36 for search "tags:"Tools""

Connect Spark to HBase for reading and writing data with ease

@nerdammer / Latest release: 1.0.3 (2016-04-20) / Apache-2.0 / (3)

  • 1|streaming
  • 1|hbase
  • 1|library


Base classes to use when writing tests with Spark

@holdenk / Latest release: 2.2.2_0.11.0 (2018-12-23) / Apache-2.0 / (10)

  • 3|testing
  • 1|streaming
  • 1|tools


Sbt plugin for Spark packages

@databricks / Latest release: 0.2.4 (2016-07-15) / Apache-2.0 / (3)

  • 1|tools
  • 1|sbt


A command line tool for Spark packages

@databricks / Latest release: 0.3.0 (2015-03-17) / Apache-2.0 / (1)

  • 1|tools


Docker container for spark standalone cluster.

@epahomov / No release yet / (0)

  • 1|tools
  • 1|deployment


Maven archetype used to bootstrap a Spark Scala project

@mbonaci / Latest release: 0.9 (2015-04-24) / MIT / (0)

  • 1|Maven
  • 1|tools
  • 1|scala


SBT plugin for spark-ec2

@pishen / No release yet / (0)

  • 1|tools
  • 1|sbt
  • 1|deployment


Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL

@saurfang / Latest release: 3.0.0-s_2.12 (2020-09-13) / Apache-2.0 / (1)

  • 1|sas
  • 1|tools
  • 1|sql


Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data/Compute Engine

@ddf-project / No release yet / (11)

  • 3|API
  • 2|tools
  • 2|machine learning


Deploy Spark cluster in an easy way.

@pishen / Latest release: 0.5.1 (2015-06-25) / Apache-2.0 / (0)

  • 1|tools
  • 1|sbt
  • 1|deployment


sbt plugin for spark-submit

@saurfang / No release yet / (0)

  • 1|tools
  • 1|sbt
  • 1|deployment


Docker-based, End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark Streaming, ML, MLlib, GraphX, Kafka, Cassandra, Redis, Apache Zeppelin, Spark-Notebook, iPython/Jupyter Notebook, Tableau, H2O Flow, Tachyon,

@fluxcapacitor / No release yet / (3)

  • 2|streaming
  • 2|kafka
  • 1|machine learning


Solr Dictionary Annotator (Microservice for Spark)

@elsevierlabs-os / No release yet / (0)

  • 1|application
  • 1|tools


Create composable data processing pipelines in Spark, and execute them on a cluster using simple Scala code

@springnz / No release yet / (0)

  • 1|application
  • 1|testing
  • 1|tools


SparkR extension for dplyr

@saurfang / No release yet / (0)

  • 1|sparkr
  • 1|r
  • 1|tools


ScalaCheck for Spark

@juanrh / No release yet / (0)

  • 1|streaming
  • 1|testing
  • 1|tools


A command-line tool for launching Apache Spark clusters.

@nchammas / No release yet / (1)

  • 1|tools
  • 1|ec2
  • 1|deployment


Rebooting ggplot2 for scalable big data visualization

@SKKU-SKT / No release yet / (3)

  • 3|visualization
  • 2|r
  • 1|tools


Spark tool to handle file compaction.

@KeithSSmith / Latest release: 1.0.0 (2016-04-22) / Apache-2.0 / (0)

  • 1|tools


Provides GPU awareness to Spark

@ibmsoe / No release yet / (1)

  • 2|GPU
  • 1|spark
  • 1|tools


Some tools for outliers detection, discretisation, correlation analysis and text correction.

@hupi-analytics / No release yet / (3)

  • 1|spark
  • 1|tools
  • 1|scala


Create HTML profiling reports from Apache Spark DataFrames

@julioasotodv / Latest release: 1.1.2 (2016-07-26) / Apache-2.0 / (1)

  • 1|tools
  • 1|pyspark


Baryon is a library for building Spark Streaming applications that consume data from Kafka.

@groupon / Latest release: 1.0 (2016-07-29) / BSD 3-Clause / (0)

  • 1|streaming
  • 1|tools
  • 1|library


Mezzanine is a library built on Spark Streaming used to consume data from Kafka and store it into Hadoop.

@groupon / Latest release: 1.0 (2016-07-29) / BSD 3-Clause / (0)

  • 1|streaming
  • 1|tools
  • 1|library


Configure nginx on the master node for a reverse proxy to Apache Spark web UI and history server. No more ssh socks/tunnel.

@ekasitk / No release yet / (0)

  • 1|tool
  • 1|deployment


Openstack Spark cluster deployment

@ispras / Latest release: 0.9.5 (2016-11-10) / Apache-2.0 / (0)

  • 1|tools
  • 1|deployment


Spark SQL index for Parquet tables

@lightcopy / Latest release: 0.5.0-s_2.12 (2020-08-01) / Apache-2.0 / (1)

  • 1|sql
  • 1|tools
  • 1|parquet


DIQL: A Data Intensive Query Language for Apache Spark

@fegaras / No release yet / (0)

  • 1|tools


Microsoft Machine Learning for Apache Spark

@Azure / Latest release: 0.17 (2019-04-23) / MIT / (4)

  • 3|ml
  • 3|Microsoft
  • 3|machine learning


Optimus is the missing library for cleansing (cleaning and much more) and pre-processing data in a distributed fashion with Apache Spark.

@ironmussa / Latest release: 1.1.0 (2017-10-25) / Apache-2.0 / (2)

  • 1|machine learning
  • 1|tools
  • 1|pyspark


Tool design to speed up spark applications

@marino-serna / Latest release: 1.0.1-00 (2018-07-22) / Apache-2.0 / (1)

  • 1|tool
  • 1|parallel
  • 1|scala


Basic framework utilities to quickly start writing production ready Apache Spark applications

@tupol / Latest release: 0.6.1 (2021-10-18) / MIT / (0)

  • 1|tools
  • 1|library
  • 1|scala


Executable Apache Spark Tools: Format Converter & SQL Processor

@tupol / Latest release: 0.4.1-s_2.11 (2020-09-12) / MIT / (0)

  • 1|streaming
  • 1|sql
  • 1|kafka


The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

@archivesunleashed / Latest release: 0.18.0 (2019-08-21) / Apache-2.0 / (0)

  • 1|pyspark
  • 1|tools
  • 1|Web archives


An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.

@archivesunleashed / No release yet / (0)

  • 1|pyspark
  • 1|tools
  • 1|Digital Humanities


Rumble: JSONiq for Apache Spark

@RumbleDB / No release yet / (1)

  • 1|Applications
  • 1|tools
  • 1|nosql