A community index of third-party packages for Apache Spark.

Showing packages 101 - 150 out of 512

Infinispan Spark Connector

@infinispan / Latest release: 0.9 (2018-11-05) / Apache-2.0 / (0)

  • 1|streaming
  • 1|sql
  • 1|scala


Joins for skewed datasets in Spark

@tresata / Latest release: 0.2.0-s_2.10 (2015-11-13) / Apache-2.0 / (0)

  • 1|core


Read SparkSQL parquet file as RDD[Protobuf]

@saurfang / Latest release: 0.1.2-s_2.10 (2015-08-18) / Apache-2.0 / (0)

  • 1|data source
  • 1|protobuf
  • 1|sql


Distributed t-SNE via Apache Spark

@saurfang / No release yet / (1)

  • 1|machine learning


sbt plugin for spark-submit

@saurfang / No release yet / (0)

  • 1|tools
  • 1|sbt
  • 1|deployment


Machine Learning over Twitter's stream. Using Apache Spark, Web Server and Lightning Graph server.

@giorgioinf / Latest release: 0.2.0 (2016-06-19) / GPL-3.0 / (0)

  • 1|ml
  • 1|example
  • 1|streaming


A Stanford CoreNLP wrapper for Apache Spark

@databricks / Latest release: 0.4.0-spark2.4-scala2.11 (2018-11-16) / GPL-3.0 / (2)

  • 2|NLP
  • 2|machine learning
  • 1|NER


TFOCS for Spark, a Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)

@databricks / No release yet / (1)

  • 1|machine learning
  • 1|optimization
  • 1|convex


This is a prototype implementation of Bisecting K-Means Clustering on Spark.

@yu-iskw / Latest release: 0.1.1 (2015-08-28) / Apache-2.0 / (0)

  • 1|clustering
  • 1|machine learning
  • 1|scala


DistML provide a supplement to mllib to support model-parallel on Spark

@intel-machine-learning / No release yet / (1)

  • 1|parameter server
  • 1|machine learning


Deep Learning for Spark ML

@deeplearning4j / Latest release: 0.4-rc3.4 (2015-10-02) / Apache-2.0 / (1)

  • 1|Spark ML
  • 1|machine learning


Distributed text modeling algorithms with spark

@dding3 / No release yet / (0)


Spark on Aliyun, supporting interactions with Aliyun's base services.

@aliyun / No release yet / (1)

  • 1|streaming
  • 1|data source


Spark mainframe connector

@Syncsort / Latest release: 1.0.0 (2015-09-01) / Apache-2.0 / (0)

  • 1|input
  • 1|data source
  • 1|sql


Spark Druid Package

@SparklineData / Latest release: 0.1.0 (2016-06-03) / Apache-2.0 / (3)


Implementation of Random Ferns for Apache Spark

@CeON / Latest release: 0.2.0 (2015-10-08) / Apache-2.0 / (0)

  • 3|machine learning


GIS extension for SparkSQL

@drubbo / No release yet / (0)


EventHubs Receiver for Spark Streaming

@hdinsight / No release yet / (0)

  • 1|Azure
  • 1|streaming
  • 1|eventhubs


Distribtued Topic Modeling on Apache Spark

@intel-analytics / No release yet / (1)

  • 1|graph
  • 1|LDA
  • 1|machine learning


Phylogenetic tree inference tool

@xingjianxu / No release yet / (0)

  • 1|Phylogenetic
  • 1|Bioinformatics


Linear algebra operators for Apache Spark MLlib's linalg package

@brkyvz / Latest release: 0.1.0 (2015-09-09) / Apache-2.0 / (1)

  • 1|linear algebra
  • 1|lazy
  • 1|machine learning


Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark

@collectivemedia / No release yet / (0)


Script to submit spark jobs on a traditional HPC cluster

@ekasitk / No release yet / (0)

  • 1|deployment


Implementation of Stochastic Outlier Selection (SOS) which is an unsupervised outlier selection algorithm.

@rug-ds-lab / No release yet / (1)

  • 1|outlier detection


Implementation of Stochastic Outlier Selection (SOS) which is an unsupervised outlier selection algorithm.

@Fokko / Latest release: 0.1.0 (2015-09-11) / Apache-2.0 / (1)

  • 1|outlier detection


A connector for Spark that allows reading and writing to/from Redis cluster

@RedisLabs / Latest release: 2.3.0 (2018-11-04) / BSD 3-Clause / (3)

  • 2|redis
  • 1|storage
  • 1|data structures


A Neural network implementation with Scala

@nearbydelta / No release yet / (0)

  • 1|neural network
  • 1|machine learning


Simplifying robust end-to-end machine learning on Apache Spark.

@amplab / No release yet / (0)


A Spark machine learning package containing the implementation of classical Convolutional Neural Network

@hhbyyh / No release yet / (1)

  • 1|app
  • 1|machi


Spark-xml-utils provides the ability to filter documents based on an xpath expression, return specific nodes for an xpath/xquery expression, or transform documents using a xslt stylesheet.

@elsevierlabs-os / Latest release: 1.10.0 (2021-12-08) / Apache-2.0 / (0)

  • 1|xml
  • 1|xpath
  • 1|xquery


Docker-based, End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark Streaming, ML, MLlib, GraphX, Kafka, Cassandra, Redis, Apache Zeppelin, Spark-Notebook, iPython/Jupyter Notebook, Tableau, H2O Flow, Tachyon,

@fluxcapacitor / No release yet / (3)

  • 2|streaming
  • 2|kafka
  • 1|machine learning


Big Spatial Data Processing using Spark

@syoummer / Latest release: 1.0 (2015-10-08) / Apache-2.0 / (1)

  • 1|geospatial
  • 1|WKT
  • 1|spatial


Function for computing K-NN in Apache Spark

@jakac / Latest release: 0.0.3 (2015-10-06) / Apache-2.0 / (0)


Enhanced Python Dataframes for Spark

@dondrake / No release yet / (0)

  • 1|python
  • 1|sql
  • 1|pyspark


Implementation of Factorization Machines on Spark using parallel stochastic gradient descent (python and scala)

@blebreton / No release yet / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Spark Modularized View

@TresAmigosSD / No release yet / (0)

  • 1|core
  • 1|sql


Solr Dictionary Annotator (Microservice for Spark)

@elsevierlabs-os / No release yet / (0)

  • 1|application
  • 1|tools


Spark implementation of Nearest Neighbours Mean Shift using LSH

@Kybe67 / No release yet / (1)

  • 1|lsh
  • 1|machine learning


Library for computing clustering coefficient

@SherlockYang / Latest release: 0.1 (2015-10-22) / LGPL-3.0 / (1)

  • 1|graph


Create composable data processing pipelines in Spark, and execute them on a cluster using simple Scala code

@springnz / No release yet / (0)

  • 1|application
  • 1|testing
  • 1|tools


gradient boosting tree with arbitrary user-defined loss function

@rotationsymmetry / Latest release: 0.2.1-s_2.10 (2015-11-01) / Apache-2.0 / (0)

  • 1|machine learning


Livy, a REST Spark Server for submitting jobs and code snippets

@cloudera / No release yet / (2)

  • 1|application
  • 1|REST
  • 1|interactive


SparkR extension for dplyr

@saurfang / No release yet / (0)

  • 1|sparkr
  • 1|r
  • 1|tools


C# API for Apache Spark. (Package moved to http://spark-packages.org/package/Microsoft/Mobius)

@skaarthik / No release yet / (2)

  • 1|streaming
  • 1|examples
  • 1|sql


Large-scale Machine Learning using Apache Spark

@project-mandolin / No release yet / (0)

  • 1|machine learning


Estus Scientific Library

@EstusDev / No release yet / (0)

  • 1|machine learning


XML data source for Spark SQL and DataFrames

@HyukjinKwon / Latest release: 0.1.1-s_2.10 (2015-11-19) / Apache-2.0 / (1)

  • 1|sql
  • 1|DataSource
  • 1|SparkSQL


Some utility classes for checking data quality in Spark

@FRosner / Latest release: 5.0.0-s_2.11 (2020-03-21) / Apache-2.0 / (1)

  • 1|data frames
  • 1|data quality


Library for building data products

@elyast / No release yet / (0)


k-Nearest Neighbors algorithm on Spark

@saurfang / Latest release: 0.3.0 (2020-02-06) / Apache-2.0 / (1)

  • 2|ml
  • 2|machine learning