A community index of third-party packages for Apache Spark.

Showing packages 251 - 300 out of 512

(WIP) This is a set of Spark application examples, which run on spark-shell for beginners

@dobachi / Latest release: 0.1.0 (2016-08-04) / Apache-2.0 / (0)

  • 1|example


A plugin to enable Apache Spark to read HDF5 files

@LLNL / Latest release: 0.0.4 (2016-09-10) / Apache-2.0 / (0)

  • 1|input
  • 1|sql
  • 1|hdf5


Write your RDDs and DStreams to Kafka seamlessly

@BenFradet / Latest release: 0.4.0 (2017-07-22) / Apache-2.0 / (0)

  • 1|streaming
  • 1|data source


Rich Spark adds more to Apache Spark

@mashin-io / No release yet / (0)

  • 1|ml
  • 1|library
  • 1|streaming


An implementation of Markov Clustering algorithm for Spark in Scala

@joandre / Latest release: 1.0.0-s_2.11 (2016-08-15) / MIT / (0)

  • 1|graph


Ranking algorithms for Spark DataFrame

@yu-iskw / Latest release: 0.0.4 (2016-08-26) / Apache-2.0 / (0)

  • 1|ml
  • 1|machine learning
  • 1|scala


An example of Spark GraphX as an analytics engine and Cassandra as persistence layer.

@knoldus / No release yet / (1)

  • 1|graph
  • 1|ex


Maelstrom is an open source Kafka integration with Spark that is designed to be developer friendly, high performance (millisecond stream processing), scalable (consumes messges at Spark worker nodes), and is extremely reliable.

@jeoffreylim / No release yet / (0)

  • 1|streaming
  • 1|kafka


Geo Spatial Data Analytics on Spark

@yuanzhaoYZ / No release yet / (0)


Apache HBase Connector for Apache Spark

@hortonworks-spark / No release yet / (0)


This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

@knoldus / No release yet / (0)


Configure nginx on the master node for a reverse proxy to Apache Spark web UI and history server. No more ssh socks/tunnel.

@ekasitk / No release yet / (0)

  • 1|tool
  • 1|deployment


Scalable implementation of artificial neural networks for Spark deep learning

@avulanov / Latest release: 1.0.0 (2016-09-09) / Apache-2.0 / (1)

  • 1|deep learning
  • 1|machine learning


A Scala Implementation of Annoy which searches nearest neighbors given query point. Ann4s also provides DataFrame-based API for Apache Spark.

@mskimm / No release yet / (0)

  • 1|kNN
  • 1|machine learning


A library for time series analysis on Apache Spark

@sryza / Latest release: 0.4.1 (2016-11-15) / Apache-2.0 / (0)


A library to load data into Spark SQL DataFrames from Hive using LLAP

@hortonworks-spark / No release yet / (0)


An indexed columnar file format for interactive query

@jackylk / No release yet / (0)


Snowflake Data Source for Apache Spark.

@snowflakedb / Latest release: 2.5.1-spark_2.4 (2019-08-01) / Apache-2.0 / (2)

  • 1|sql
  • 1|snowflake
  • 1|da


A library having Java and Scala examples for Spark 2.0.0

@knoldus / No release yet / (1)

  • 1|java
  • 1|example
  • 1|scala


test

@kanterov / Latest release: 0.3.1 (2016-10-03) / Apache-2.0 / (0)


Apache Spark datasource for OrientDB

@sbcd90 / No release yet / (1)

  • 1|orientdb
  • 1|spark datasource


Stream Data analysis on IoT generated data via Apache spark

@shiv4nsh / No release yet / (0)

  • 1|streaming
  • 1|spark
  • 1|example


Spark Connector for Workday

@springml / Latest release: 1.1.0 (2017-03-10) / Apache-2.0 / (1)


A example for Spark ML and StanfordNLP for topic discovery using LDA clustering

@shiv4nsh / No release yet / (0)

  • 1|clustering
  • 1|spark
  • 1|scala


Spark connector for BigQuery

@appsflyer-dev / Latest release: 0.1.1 (2017-01-29) / Apache-2.0 / (0)

  • 1|BigQuery


Standard Spark transformations

@mrpowers / No release yet / (0)


A REST Api for CRUD operations on Cassandra using Apache Spark

@shiv4nsh / No release yet / (0)

  • 1|application
  • 1|spark
  • 1|example


Distributed deep learning with Keras and Apache Spark.

@JoeriHermans / No release yet / (0)

  • 1|machine learning
  • 1|pyspark


A Cluster Computing System for Processing Large-Scale Spatial Data

@DataSystemsLab / No release yet / (1)

  • 1|geometry operations
  • 1|geospatial
  • 1|spatial queries


Practical utilities for spark applications

@CeON / Latest release: 1.0.0 (2016-10-19) / Apache-2.0 / (0)


This library customizes some DataFrame outputs.

@jeanbaptistepriez / No release yet / (1)

  • 1|pyth
  • 1|jupyter
  • 1|pyspark


Greedy K-means Spark Package in Python

@Hongfu-Liu / No release yet / (0)


Spark SQL datasource for GitHub PR API

@lightcopy / Latest release: 1.3.0-s_2.10 (2016-12-25) / Apache-2.0 / (0)

  • 1|input
  • 1|library
  • 1|sql


Positive-Unlabeled Learning for Apache Spark

@ispras / No release yet / (0)

  • 1|machine learning


Spark Structured Streaming Kafka 0.8 Source Implementation

@jerryshao / No release yet / (0)


Spark NetSuite Connector

@springml / Latest release: 1.1.0 (2017-03-10) / Apache-2.0 / (2)


R interface for Apache Spark

@rstudio / No release yet / (0)


A parallel implementation of word2vec based on Spark

@chen-lin / No release yet / (1)

  • 1|machine learning


Openstack Spark cluster deployment

@ispras / Latest release: 0.9.5 (2016-11-10) / Apache-2.0 / (0)

  • 1|tools
  • 1|deployment


A library of scalable frequent itemset mining algorithms based on Spark

@chen-lin / No release yet / (1)

  • 1|frequent itemset mining
  • 1|association rule mining
  • 1|data mining


A Spark datasource for the HadoopCryptoLedger library

@ZuInnoTe / Latest release: 1.3.2-s_2.12 (2021-12-24) / Apache-2.0 / (1)

  • 1|hadoocryptoledger
  • 1|data source
  • 1|bitcoin


launching and controlling spark on hpc clusters made easy

@rokroskar / No release yet / (0)

  • 1|hpc
  • 1|high-performance computing
  • 1|pyspark


implement Adam for stochastic optimization.

@VinceShieh / Latest release: 0.1 (2016-12-13) / Apache-2.0 / (1)

  • 1|ml
  • 1|ma
  • 1|mllib


A parallel implementation of factorization machines based on Spark

@chen-lin / No release yet / (1)

  • 1|factorization machines
  • 1|machine learning


Robust and scalable join operators using sort-merge algorithm (high data skew, low cardinality, etc)

@hindog / Latest release: 2.0.1 (2017-04-04) / Apache-2.0 / (0)

  • 1|core


Quick summary: This code implements a spectral (third order tensor decomposition) learning method for learning LDA topic model on Spark.

@FurongHuang / Latest release: 1.0 (2016-12-04) / Apache-2.0 / (1)

  • 1|machine learning


Kraps: safe, robust and reliable data pipelines over Apache Spark.

@krapsh / Latest release: 0.1.9-s_2.11 (2017-01-16) / Apache-2.0 / (0)

  • 1|API
  • 1|haskell
  • 1|REST


A PySpark simple greedy parallel implementation of 0-1 Knapsack algorithm.

@drulm / No release yet / (0)


Spark Zuora Connector

@springml / Latest release: 1.1.0 (2017-03-10) / Apache-2.0 / (1)


Distributed Linear Programming Solver with Apache Spark

@ehsanmok / No release yet / (1)

  • 1|machine learning
  • 1|optimization
  • 1|convex