Spark Packages

A community index of third-party packages for Apache Spark.

Showing packages 201 - 250 out of 518

spark-to-tableau

Spark DataFrame to Tableau Data Extract library

@werneckpaiva / Latest release: 0.1.0 (2016-03-01) / Apache-2.0 / (0)

spark-hazelcast-connector

Connects Spark to Hazelcast

@erenavsarogullari / Latest release: 1.0.0-s_2.11 (2016-03-07) / Apache-2.0 / (0)

Spark-CluStream

Adaptation of the CluStream method in Spark

@obackhoff / Latest release: 0.6.5 (2016-03-31) / Apache-2.0 / (1)

mleap

MLeap allows for easily putting Spark ML pipelines into production

@TrueCar / Latest release: 0.1.5 (2016-06-06) / Apache-2.0 / (2)

spark-scdengine

Capture SCD (Slowly Changing Dimension) on Spark

@dhmodi / No release yet / (1)

spark-betweenness

k Betweenness Centrality algorithm for Spark using GraphX

@dmarcous / Latest release: 1.0-s_2.10 (2016-03-14) / Apache-2.0 / (2)

tensorframes

Tensorflow wrapper for DataFrames on Apache Spark

@tjhunter / Latest release: 0.2.2-s_2.10 (2016-05-18) / Apache-2.0 / (0)

ggplot2.SparkR

Rebooting ggplot2 for scalable big data visualization

@SKKU-SKT / No release yet / (3)

word2phrase

word2phrase algorithm for spark

@s4weng / Latest release: 1.0.1 (2016-04-08) / Apache-2.0 / (0)

stocator

High performing connector to object storage for Apache Spark. Supports IBM Cloud Object Storage and OpenStack Swift

@SparkTC / Latest release: 1.1.4 (2021-12-07) / Apache-2.0 / (1)

SparkNet

Distributed Neural Networks for Spark

@amplab / No release yet / (0)

ActiveMQReceiver

Active MQ Receiver

@hafizmujadid / No release yet / (0)

spark-parallelized-sgd

Parallelized Stochastic Gradient Descent (SGD) with Apache Spark

@yu-iskw / Latest release: 0.0.2 (2016-03-30) / Apache-2.0 / (0)

Crossdata

Easy access to big things. Library for Apache Spark extending and improving its capabilities

@Stratio / No release yet / (1)

spark-hyperloglog

Algebird's HyperLogLog support for Apache Spark.

@vitillo / Latest release: 1.1.1 (2016-09-14) / Apache-2.0 / (0)

psaml

Python Sensitivity Analysis of ML models in Apache Spark

@psaml / No release yet / (0)

spark-dynamodb

DynamoDB data source for Apache Spark

@traviscrawford / No release yet / (0)

spark-riak-connector

The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV

@basho / Latest release: 1.6.3 (2017-03-17) / Apache-2.0 / (2)

spark-compaction

Spark tool to handle file compaction.

@KeithSSmith / Latest release: 1.0.0 (2016-04-22) / Apache-2.0 / (0)

GPUEnabler

Provides GPU awareness to Spark

@ibmsoe / No release yet / (1)

gihyo-spark-book-example

技術評論社「詳解Apache Spark」のサンプルコード

@yu-iskw / Latest release: 1.0.1 (2016-04-22) / Apache-2.0 / (1)

spark-crossdata

SparkSQL extension as a library for Apache Spark extending and improving its capabilities for a data federation system.

@Stratio / Latest release: 1.4.0 (2016-07-06) / Apache-2.0 / (6)

mleap-demo

MLeap demo repository for use with MLeap blog posts

@TrueCar / No release yet / (1)

Mobius

C# API for Apache Spark

@Microsoft / Latest release: 1.6.100 (2016-05-02) / MIT / (2)

geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.

@geotrellis / Latest release: 0.10.0 (2016-04-28) / Apache-2.0 / (1)

sparkka-streams

Power a Spark Stream from anywhere in your Akka Stream Flow

@lloydmeta / No release yet / (0)

spark-lucenerdd

Spark RDD with Lucene's query capabilities

@zouzias / Latest release: 0.3.3 (2018-07-24) / Apache-2.0 / (0)

spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames

@spotify / Latest release: 0.2.2-s_2.10 (2017-11-29) / Apache-2.0 / (3)

Imb-sampling-ROS_and_RUS

@saradelrio / No release yet / (0)

spark-metrics

A library to expose Apache Spark's metrics system

@groupon / Latest release: 1.0 (2016-05-21) / BSD 3-Clause / (0)

sparkonda

Minimalistic utility library to manage conda environments for PySpark jobs on Yarn clusters

@moutai / No release yet / (0)

neo4j-spark-connector

Officially supported, Apache 2 licensed Neo4j Connector for Apache Spark.

@neo4j-contrib / Latest release: 5.3.1-s_2.13 (2024-07-08) / Apache-2.0 / (2)

yggdrasil

Yggdrasil: Faster Decision Trees Using Column Partitioning in Spark

@fabuzaid21 / Latest release: 1.0.1 (2018-05-11) / Apache-2.0 / (1)

spark-kuromoji-tokenizer

Kuromoji Tokenizer for Spark DataFrame

@yu-iskw / Latest release: 1.2.0 (2016-06-29) / Apache-2.0 / (0)

DataScienceTools

Some tools for outliers detection, discretisation, correlation analysis and text correction.

@hupi-analytics / No release yet / (3)

tensorframes

Tensorflow wrapper for DataFrames on Apache Spark

@databricks / Latest release: 0.8.2-s_2.11 (2019-10-24) / Apache-2.0 / (4)

mongo-spark

The official MongoDB Spark Connector

@mongodb / Latest release: 3.0.1 (2021-02-03) / Apache-2.0 / (20)

Datasource-Receiver

Spark Receiver for SQL or NoSQL Databases like Cassandra, MongoDB, Elasticsearch or JDBC

@Stratio / Latest release: 0.1.0 (2016-06-30) / Apache-2.0 / (1)

spark-s3

Amazon Web Services S3 library

@EntilZha / No release yet / (0)

Email_Spam_Spark

In this small project we will predict that email belong to which folder it will go in spam or primary.

@phalodi / No release yet / (2)

spark-akka-http-couchbase-starter-kit

CRUD operations on Couchbase using Apache Spark

@shiv4nsh / No release yet / (2)

spark-wordtophrase

Spark RDD based implementation of word2phrase algorithm

@tresata / No release yet / (0)

code

project

@spatnam / No release yet / (0)

spark-df-profiling

Create HTML profiling reports from Apache Spark DataFrames

@julioasotodv / Latest release: 1.1.2 (2016-07-26) / Apache-2.0 / (1)

spark-color-converter

Color RGB to Hex converter

@xta / Latest release: 0.0.3 (2016-08-01) / MIT / (0)

spark-ignite

A sample application to demonstrate sharing RDDs states across spark applications.

@knoldus / No release yet / (2)

baryon

Baryon is a library for building Spark Streaming applications that consume data from Kafka.

@groupon / Latest release: 1.0 (2016-07-29) / BSD 3-Clause / (0)

mezzanine

Mezzanine is a library built on Spark Streaming used to consume data from Kafka and store it into Hadoop.

@groupon / Latest release: 1.0 (2016-07-29) / BSD 3-Clause / (0)

k-means-pipline

An ML pipeline to cluster DataFrames with categorical values using K-Means

@knoldus / No release yet / (1)

k-means-pipeline

An ML pipeline to cluster DataFrames with categorical values using K-Means

@knoldus / Latest release: 0.0.1 (2016-07-30) / Apache-2.0 / (1)