A community index of third-party packages for Apache Spark.

Showing packages 151 - 200 out of 512

Popular ML Datasets for Spark ML (MNIST, IRIS, CIFAR)

@cookieai / Latest release: 0.1.0 (2015-12-22) / Apache-2.0 / (0)

  • 1|data source
  • 1|machine learning


Google Spreadsheets datasource for SparkSQL and DataFrames

@potix2 / Latest release: 0.6.3-s_2.11 (2019-08-21) / Apache-2.0 / (1)

  • 1|sql
  • 1|data source
  • 1|scala


Kaggle Job repository

@Lewuathe / Latest release: 0.0.1 (2015-11-22) / Apache-2.0 / (0)

  • 1|examples


A Spark package for retrieving data from Google Analytics

@crealytics / Latest release: 0.8.1 (2015-12-14) / Apache-2.0 / (0)


A spark package to approximate the diameter of large graphs

@Cecca / Latest release: 0.2.0-s_2.11 (2017-03-09) / Apache-2.0 / (1)

  • 1|graph


Spark Netezza Connector

@SparkTC / Latest release: 0.1.1-s_2.10 (2016-02-06) / Apache-2.0 / (0)


Spark Package to read and write PLY, LAS and XYZ lidar point clouds using Spark SQL.

@IGNF / Latest release: 0.1.0-s_2.10 (2015-12-08) / Apache-2.0 / (0)

  • 1|geospatial
  • 1|data source
  • 1|sql


CityMap coding test plus 3 solutions, 1 with Spark/GraphX

@fancellu / No release yet / (0)

  • 1|graph
  • 1|example


Benford Analysis for Spark package.

@dvgodoy / Latest release: v0.0.1 (2015-12-13) / Apache-2.0 / (0)


A library for querying Google AdWords data with Apache Spark, for Spark SQL and DataFrames

@crealytics / Latest release: 0.8.2 (2015-12-14) / Apache-2.0 / (0)


DataFrame support for HBase

@zhzhan / Latest release: 0.0.11-1.6.1-s_2.10 (2016-04-05) / Apache-2.0 / (1)


Node.js bindings for Apache Spark

@henridf / No release yet / (0)


StatsD Metrics Reporter for Spark metrics

@vidhyaarvind / No release yet / (0)


dllib is a deep learning module running on Apache Spark.

@Lewuathe / Latest release: 0.0.9 (2017-01-10) / Apache-2.0 / (1)

  • 1|deep learning
  • 1|machine learning


ScalaCheck for Spark

@juanrh / No release yet / (0)

  • 1|streaming
  • 1|testing
  • 1|tools


NCG acceleration of ALS for computing low rank matrix factorizations for Collaborative Filtering

@mbhynes / No release yet / (0)


Assembly of fundamental statistics implemented based on Apache Spark

@hhbyyh / No release yet / (0)

  • 1|statistics
  • 1|machine learning


Spark connector for Ryft ONE

@getryft / Latest release: 0.9.0 (2017-04-04) / other license / (1)

  • 1|search
  • 1|pyspark
  • 1|scala


Implementations of Hierarchical Dirichlet Process (HDP) on Spark

@tund / No release yet / (0)


Modular, non-linear data pipeline framework for Spark

@unchartedsoftware / Latest release: 0.9.7 (2016-02-24) / BSD 3-Clause / (0)

  • 1|etl
  • 1|data processing


Spark connector for SFTP

@springml / Latest release: 1.1.3 (2018-10-01) / Apache-2.0 / (2)

  • 1|data source


A command-line tool for launching Apache Spark clusters.

@nchammas / No release yet / (1)

  • 1|tools
  • 1|ec2
  • 1|deployment


R OpenCPU Spark Executor (ROSE) Library

@onetapbeyond / Latest release: 1.0 (2016-01-11) / Apache-2.0 / (0)

  • 1|analytics
  • 1|r
  • 1|statistics


Assess binary classifier calibration (i.e., how well classifier outputs match observed class proportions) in Spark

@robert-dodier / No release yet / (0)

  • 1|machine learning


AnticipatoRy Complex Adaptive Network Extrapolation (ARCANE) Library Apache Spark Harness

@drmichaelnorth / Latest release: 1.0.0 (2016-01-20) / BSD 3-Clause / (0)


Scikit-learn integration package for Apache Spark

@databricks / Latest release: 0.2.3 (2017-09-29) / BSD 3-Clause / (1)

  • 2|machine learning
  • 1|ml
  • 1|scikit-learn


Compute Wilcoxon-Mann-Whitney rank sum statistic in Apache Spark

@robert-dodier / No release yet / (0)


Data source for querying SPARQL endpoints

@USU-Research / Latest release: 1.0.0-beta1-s_2.10 (2016-01-27) / Apache-2.0 / (0)

  • 1|data source
  • 1|sparql
  • 1|sql


NetFlow data source for Spark SQL and DataFrames

@sadikovi / Latest release: 2.1.0-s_2.12 (2020-12-24) / Apache-2.0 / (2)

  • 1|input
  • 1|library
  • 1|sql


Apache Spark AWS Lambda Executor (SAMBA)

@onetapbeyond / Latest release: 1.0 (2016-01-31) / Apache-2.0 / (0)

  • 1|API
  • 1|AWS
  • 1|REST


Spark uploader for S3

@knoldus / No release yet / (1)

  • 2|data source
  • 1|scala


Large scale, distributed graph processing made easy! Load your graph from multiple formats and compute measures (but not only) 

@sparkling-graph / Latest release: 0.0.7 (2017-05-16) / BSD 2-Clause / (5)

  • 4|graph
  • 3|library
  • 2|example


A Distributed Evolutionary Multivariate Discretizer (DEMD)

@sramirez / Latest release: 1.0 (2016-02-04) / Apache-2.0 / (2)

  • 1|discretization
  • 1|machine learning


kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data.

@JMailloH / Latest release: 3.0 (2016-07-12) / Apache-2.0 / (4)

  • 2|ml
  • 2|mllib
  • 2|machine learning


PCARD ensemble method. Ensemble of decision trees based on Random Discretization and Principal Components Analysis.

@djgg / Latest release: 1.3 (2018-04-05) / Apache-2.0 / (3)

  • 1|machine learning
  • 1|ensemble
  • 1|mllib


JMS spark receiver

@tbfenet / Latest release: 0.2.1-s_2.11 (2016-11-23) / Apache-2.0 / (0)

  • 2|streaming


Kaggle's click through rate prediction with Pipeline API

@yu-iskw / Latest release: 1.1 (2016-02-10) / Apache-2.0 / (0)

  • 1|ml
  • 1|example


The Official Couchbase Spark Connector

@couchbase / Latest release: 2.2.0 (2017-09-20) / Apache-2.0 / (2)

  • 1|streaming
  • 1|library
  • 1|sql


Approximate nearest neighbor search using locality-sensitive hashing

@karlhigley / Latest release: 0.2.2 (2016-07-05) / MIT / (0)

  • 1|lsh
  • 1|machine learning


CRF-Spark

@hqzizania / No release yet / (0)

  • 1|NLP
  • 1|mllib


This tutorial provides a quick introduction to using Spark

@rklick-solutions / No release yet / (2)

  • 2|RDD
  • 2|spark
  • 2|Spark SQL


Compute graphs using Apache GraphX.

@rklick-solutions / No release yet / (2)

  • 2|spark
  • 2|sbt
  • 2|scala


Scalable deep learning running Caffe inside Spark executors with peer-to-peer communication

@yahoo / No release yet / (1)

  • 1|deep learning
  • 1|caffe
  • 1|machine learning


SnappyData: OLTP + OLAP Database built on Apache Spark

@SnappyDataInc / Latest release: 1.2.0-s_2.11 (2020-02-07) / Apache-2.0 / (4)

  • 2|database
  • 1|data source
  • 1|sql


GraphFrames: DataFrame-based Graphs

@graphframes / Latest release: 0.8.3-spark3.5-s_2.13 (2023-10-07) / Apache-2.0 / (10)

  • 5|graph
  • 4|DataFrame


Spark-lever is based on Spark Streaming,it is a proactive capability-aware load balancing system for batch stream processing on heterogeneous clusters.

@trueyao / No release yet / (2)

  • 2|streaming


Pregel implementation of PageRank in Spark

@uzink / No release yet / (0)


Deprecated, please see bahir/sql-cloudant

@cloudant-labs / Latest release: 2.0.0-s_2.11 (2016-09-23) / Apache-2.0 / (1)

  • 1|IBM
  • 1|Cloudant


k Betweenness Centrality algorithm for Spark using GraphX

@dmarcous / Latest release: 1.0-s_2.10 (2016-02-29) / Apache-2.0 / (3)

  • 2|graph
  • 1|centrality


Spark MLlib wrapper around Snowball stemming

@master / Latest release: 0.2.1 (2018-11-28) / BSD 2-Clause / (0)

  • 1|machine learning