A community index of third-party packages for Apache Spark.

Showing packages 201 - 250 out of 516

Spark DataFrame to Tableau Data Extract library

@werneckpaiva / Latest release: 0.1.0 (2016-03-01) / Apache-2.0 / (0)

  • 1|tableau
  • 1|DataFrame


Connects Spark to Hazelcast

@erenavsarogullari / Latest release: 1.0.0-s_2.11 (2016-03-07) / Apache-2.0 / (0)

  • 1|streaming
  • 1|spark
  • 1|scala


Adaptation of the CluStream method in Spark

@obackhoff / Latest release: 0.6.5 (2016-03-31) / Apache-2.0 / (1)

  • 1|clustering
  • 1|streaming
  • 1|machine learning


MLeap allows for easily putting Spark ML pipelines into production

@TrueCar / Latest release: 0.1.5 (2016-06-06) / Apache-2.0 / (2)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Capture SCD (Slowly Changing Dimension) on Spark

@dhmodi / No release yet / (1)

  • 1|SCD
  • 1|Change data capture
  • 1|Slowly changing dimension


k Betweenness Centrality algorithm for Spark using GraphX

@dmarcous / Latest release: 1.0-s_2.10 (2016-03-14) / Apache-2.0 / (2)

  • 1|graph
  • 1|centrality


Tensorflow wrapper for DataFrames on Apache Spark

@tjhunter / Latest release: 0.2.2-s_2.10 (2016-05-18) / Apache-2.0 / (0)


Rebooting ggplot2 for scalable big data visualization

@SKKU-SKT / No release yet / (3)

  • 3|visualization
  • 2|r
  • 1|tools


word2phrase algorithm for spark

@s4weng / Latest release: 1.0.1 (2016-04-08) / Apache-2.0 / (0)

  • 1|phrase
  • 1|spark
  • 1|word


High performing connector to object storage for Apache Spark.  Supports IBM Cloud Object Storage and OpenStack Swift

@SparkTC / Latest release: 1.1.4 (2021-12-07) / Apache-2.0 / (1)

  • 1|data source
  • 1|Swift
  • 1|data s


Distributed Neural Networks for Spark

@amplab / No release yet / (0)


Active MQ Receiver

@hafizmujadid / No release yet / (0)


Parallelized Stochastic Gradient Descent (SGD) with Apache Spark

@yu-iskw / Latest release: 0.0.2 (2016-03-30) / Apache-2.0 / (0)

  • 1|ml
  • 1|machine learning


Easy access to big things. Library for Apache Spark extending and improving its capabilities

@Stratio / No release yet / (1)


Algebird's HyperLogLog support for Apache Spark.

@vitillo / Latest release: 1.1.1 (2016-09-14) / Apache-2.0 / (0)


Python Sensitivity Analysis of ML models in Apache Spark

@psaml / No release yet / (0)


DynamoDB data source for Apache Spark

@traviscrawford / No release yet / (0)


The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV

@basho / Latest release: 1.6.3 (2017-03-17) / Apache-2.0 / (2)

  • 3|python
  • 3|riak
  • 3|data source


Spark tool to handle file compaction.

@KeithSSmith / Latest release: 1.0.0 (2016-04-22) / Apache-2.0 / (0)

  • 1|tools


Provides GPU awareness to Spark

@ibmsoe / No release yet / (1)

  • 2|GPU
  • 1|spark
  • 1|tools


技術評論社「詳解Apache Spark」のサンプルコード

@yu-iskw / Latest release: 1.0.1 (2016-04-22) / Apache-2.0 / (1)

  • 1|example


SparkSQL extension as a library for Apache Spark extending and improving its capabilities for a data federation system.

@Stratio / Latest release: 1.4.0 (2016-07-06) / Apache-2.0 / (6)

  • 3|SparkSQL
  • 3|sql
  • 2|library


MLeap demo repository for use with MLeap blog posts

@TrueCar / No release yet / (1)


C# API for Apache Spark

@Microsoft / Latest release: 1.6.100 (2016-05-02) / MIT / (2)

  • 1|streaming
  • 1|examples
  • 1|sql


GeoTrellis is a geographic data processing engine for high performance applications.

@geotrellis / Latest release: 0.10.0 (2016-04-28) / Apache-2.0 / (1)

  • 1|raster
  • 1|vector
  • 1|geospatial


Power a Spark Stream from anywhere in your Akka Stream Flow

@lloydmeta / No release yet / (0)


Spark RDD with Lucene's query capabilities

@zouzias / Latest release: 0.3.3 (2018-07-24) / Apache-2.0 / (0)

  • 1|search
  • 1|geospatial
  • 1|GeoJSON


Google BigQuery support for Spark, SQL, and DataFrames

@spotify / Latest release: 0.2.2-s_2.10 (2017-11-29) / Apache-2.0 / (3)

  • 1|input
  • 1|data source
  • 1|sql


Imb-sampling-ROS_and_RUS

@saradelrio / No release yet / (0)

  • 1|undersampling
  • 1|oversampling
  • 1|sampling


A library to expose Apache Spark's metrics system

@groupon / Latest release: 1.0 (2016-05-21) / BSD 3-Clause / (0)

  • 1|metrics
  • 1|application
  • 1|core


Minimalistic utility library to manage conda environments for PySpark jobs on Yarn clusters

@moutai / No release yet / (0)


Officially supported, Apache 2 licensed Neo4j Connector for Apache Spark.

@neo4j-contrib / Latest release: 5.3.1-s_2.13 (2024-07-08) / Apache-2.0 / (2)

  • 1|graph
  • 1|data source
  • 1|database


Yggdrasil: Faster Decision Trees Using Column Partitioning in Spark

@fabuzaid21 / Latest release: 1.0.1 (2018-05-11) / Apache-2.0 / (1)

  • 1|machine learning


Kuromoji Tokenizer for Spark DataFrame

@yu-iskw / Latest release: 1.2.0 (2016-06-29) / Apache-2.0 / (0)

  • 1|ml
  • 1|machine learning


Some tools for outliers detection, discretisation, correlation analysis and text correction.

@hupi-analytics / No release yet / (3)

  • 1|spark
  • 1|tools
  • 1|scala


Tensorflow wrapper for DataFrames on Apache Spark

@databricks / Latest release: 0.8.2-s_2.11 (2019-10-24) / Apache-2.0 / (4)

  • 2|tensorflow


The official MongoDB Spark Connector

@mongodb / Latest release: 3.0.1 (2021-02-03) / Apache-2.0 / (20)

  • 3|MongoDB
  • 2|Spark SQL
  • 2|nosql


Spark Receiver for SQL or NoSQL Databases like Cassandra, MongoDB, Elasticsearch or JDBC

@Stratio / Latest release: 0.1.0 (2016-06-30) / Apache-2.0 / (1)

  • 1|streaming
  • 1|library
  • 1|sql


Amazon Web Services S3 library

@EntilZha / No release yet / (0)


In this small project we will predict that email belong to which folder it will go in spam or primary.

@phalodi / No release yet / (2)

  • 1|ml
  • 1|example
  • 1|scala


CRUD operations on Couchbase using Apache Spark

@shiv4nsh / No release yet / (2)

  • 1|spark
  • 1|example
  • 1|scala


Spark RDD based implementation of word2phrase algorithm

@tresata / No release yet / (0)


project

@spatnam / No release yet / (0)


Create HTML profiling reports from Apache Spark DataFrames

@julioasotodv / Latest release: 1.1.2 (2016-07-26) / Apache-2.0 / (1)

  • 1|tools
  • 1|pyspark


Color RGB to Hex converter

@xta / Latest release: 0.0.3 (2016-08-01) / MIT / (0)


A sample application to demonstrate sharing RDDs states across spark applications.

@knoldus / No release yet / (2)

  • 2|apacheIgniteWithapacheSpark
  • 2|In memory computing
  • 2|SharedRDD


Baryon is a library for building Spark Streaming applications that consume data from Kafka.

@groupon / Latest release: 1.0 (2016-07-29) / BSD 3-Clause / (0)

  • 1|streaming
  • 1|tools
  • 1|library


Mezzanine is a library built on Spark Streaming used to consume data from Kafka and store it into Hadoop.

@groupon / Latest release: 1.0 (2016-07-29) / BSD 3-Clause / (0)

  • 1|streaming
  • 1|tools
  • 1|library


An ML pipeline to cluster DataFrames with categorical values using K-Means

@knoldus / No release yet / (1)

  • 1|clustering
  • 1|ml
  • 1|pipeline


An ML pipeline to cluster DataFrames with categorical values using K-Means

@knoldus / Latest release: 0.0.1 (2016-07-30) / Apache-2.0 / (1)

  • 1|ml
  • 1|kmeans
  • 1|etl