A community index of third-party packages for Apache Spark.

Showing packages 401 - 450 out of 517

Scalable clustering library

@beckgael / No release yet / (0)

  • 1|clustering
  • 1|machine learning
  • 1|scala


Distributed version of RELIEF-F algorithm for Apache Spark.

@sramirez / Latest release: 0.5.0 (2018-04-09) / Apache-2.0 / (0)


Smart Filtering framework for Big Data

@djgarcia / Latest release: 1.0 (2018-04-09) / Apache-2.0 / (2)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Smart Reduction framework for Big Data

@djgarcia / Latest release: 1.0 (2018-04-09) / Apache-2.0 / (2)

  • 1|kNN
  • 1|m
  • 1|mllib


Smart Imputation. k Nearest Neighbor Imputation methods

@JMailloH / Latest release: 1.0 (2018-04-11) / Apache-2.0 / (2)

  • 1|ml
  • 1|mllib
  • 1|machine learning


FITS data source for Spark SQL and DataFrames

@JulienPeloton / No release yet / (1)

  • 1|data source


i wish I knew what im doing right now

@chamba / No release yet / (1)

  • 1|test01


Topological Data Analysis Package

@ognis1205 / No release yet / (0)

  • 1|ml
  • 1|topological data analysis
  • 1|machine learning


A Distributed Associative Classifier for Apache Spark MLlib

@lucaventurini / No release yet / (1)


Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.

@audienceproject / No release yet / (0)

  • 1|dynamodb
  • 1|AWS


Bagging-RandomMiner ensemble method for anomaly detection

@wuicho-pereyra / Latest release: 1.0 (2018-05-22) / Apache-2.0 / (1)

  • 1|spark
  • 1|big data
  • 1|machine learning


Binding the GDELT universe in a Spark environment

@aamend / Latest release: 2.0 (2018-06-02) / Apache-2.0 / (1)

  • 1|NLP
  • 1|gdelt
  • 1|sql


Calculate fairness metrics using Spark

@eubr-atmosphere / No release yet / (0)


Algebird's HyperLogLog support for Apache Spark.

@jklukas / Latest release: 2.1.1.1 (2018-06-27) / Apache-2.0 / (0)


An easy-to-use, scalable, bigdata processing tool

@InterestingLab / No release yet / (0)


Algebird's HyperLogLog support for Apache Spark

@mozilla / Latest release: 2.2.0 (2018-06-29) / Apache-2.0 / (0)


spark bigquery connector copied from samelamin

@holamap / No release yet / (0)


Tool design to speed up spark applications

@marino-serna / Latest release: 1.0.1-00 (2018-07-22) / Apache-2.0 / (1)

  • 1|tool
  • 1|parallel
  • 1|scala


Google BigQuery data source for Apache Spark

@miraisolutions / Latest release: 0.1.1-s_2.11 (2019-06-07) / MIT / (2)

  • 1|google-cloud
  • 1|BigQuery


Kubernetes operator for specifying and running Apache Spark applications idiomatically on Kubernetes.

@GoogleCloudPlatform / No release yet / (0)

  • 1|application
  • 1|Kubernetes


More user-friendly spark-repl via Ammonite

@alexarchambault / No release yet / (0)


SparkMeasure is a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.

@LucaCanali / No release yet / (0)


Implementation of the Batagelj-Zaversnik algorithm

@Jovic92 / No release yet / (0)

  • 1|graph


Automated machine learning for structured data

@salesforce / Latest release: 0.7.0 (2020-06-12) / BSD 3-Clause / (5)

  • 2|ml
  • 2|machine-learning
  • 2|scala


Isolation Forest on Spark

@titicaca / Latest release: v2.4.0 (2019-01-02) / Apache-2.0 / (1)

  • 1|ml
  • 1|machine learning


Hybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark

@titicaca / Latest release: v2.4.0 (2019-01-02) / Apache-2.0 / (1)

  • 1|ml
  • 1|machine learning


Huemul BigDataGovernance, es una libreria que trabaja sobre Spark, Hive y HDFS. Permite la implementacion de una estrategia corporativa de dato unico, basada en buenas practicas de Gobierno de Datos

@HuemulSolutions / No release yet / (1)

  • 1|spark
  • 1|sql
  • 1|database


Huemul BigDataGovernance, es una libreria que trabaja sobre Spark, Hive y HDFS

@HuemulSolutions / No release yet / (0)


Equal Width Discretizer

@djgarcia / Latest release: 1.0 (2018-10-01) / Apache-2.0 / (1)

  • 1|discretization
  • 1|big data
  • 1|machine learning


Adaptive histogram estimation

@TiloWiklund / No release yet / (0)


pronounced sUrplus as it's simply better if not best!

@eisber / Latest release: 0.2.6 (2019-05-07) / MIT / (2)


SMOTE-BD: A distributed Synthetic Minority Oversampling Technique (SMOTE) for Big Data.

@majobasgall / Latest release: 0.1 (2018-11-14) / Apache-2.0 / (0)

  • 1|imbalanced
  • 1|Preprocessing
  • 1|big data


Similarity encoding of dirty categorical variables (strings)

@rakutentech / No release yet / (1)

  • 1|ml
  • 1|machine learning
  • 1|pyspark


Sample publishing project to spark

@oanhltko / Latest release: 1.0.3 (2018-12-12) / Apache-2.0 / (0)


This package contains the code for calculating external clustering validity indices in Spark.

@josemarialuna / No release yet / (0)


Spark select enables retrieving only required data from an object

@minio / Latest release: 2.1-s_2.11 (2019-04-04) / Apache-2.0 / (1)

  • 2|input
  • 2|library
  • 2|sql


A Spark SQL extension for applying adaptive selection ordering techniques in filtering

@kikniknik / No release yet / (0)

  • 1|sql


dummy-test-package

@MrBago / No release yet / (0)


Locality Sensitive Hashing for Apache Spark

@marufaytekin / No release yet / (0)

  • 1|clustering
  • 1|recommendation
  • 1|machine learning


Extensions for Spark ML/MlLib

@chitralverma / Latest release: 0.1 (2018-12-25) / Apache-2.0 / (1)

  • 1|ml
  • 1|machine learning


HS_FkNN: Hybrid Spill Tree Fuzzy k Nearest Neighbors.

@JMailloH / Latest release: 1.0 (2018-12-30) / Apache-2.0 / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Ensemble Estimators for Apache Spark ML

@pierrenodet / Latest release: 0.4.0 (2019-02-16) / Apache-2.0 / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


Kyuubi is an enhanced editon of Apache Spark's primordial Thrift JDBC/ODBC Server.

@yaooqinn / No release yet / (1)

  • 1|Multi tenant
  • 1|Spark SQL
  • 1|Thrift Server


GreenPlum Data Source for Apache SparK

@yaooqinn / No release yet / (1)

  • 1|data source
  • 1|sql


Basic framework utilities to quickly start writing production ready Apache Spark applications

@tupol / Latest release: 0.6.1 (2021-10-18) / MIT / (0)

  • 1|tools
  • 1|library
  • 1|scala


Executable Apache Spark Tools: Format Converter & SQL Processor

@tupol / Latest release: 0.4.1-s_2.11 (2020-09-12) / MIT / (0)

  • 1|streaming
  • 1|sql
  • 1|kafka


An open-source storage layer that brings scalable, ACID transactions to Apache Spark and big data workloads.

@delta-io / Latest release: 0.1.0-s_2.11 (2019-04-24) / Apache-2.0 / (1)

  • 1|streaming
  • 1|acid
  • 1|data source


A Scala based Spark Publish/Subscribe NATS Connector

@Logimethods / Latest release: 1.0.0 (2019-06-10) / MIT / (0)

  • 1|nats
  • 1|streaming
  • 1|scala


A Spark Publish/Subscribe NATS Connector

@Logimethods / Latest release: 1.0.0 (2019-06-10) / MIT / (0)

  • 1|nats
  • 1|streaming
  • 1|java


A spark client for creating tables using the given json schema

@tejeshwr / No release yet / (1)