Spark Packages

A community index of third-party packages for Apache Spark.

Showing packages 401 - 450 out of 518

Clustering4Ever-mirror

Scalable clustering library

@beckgael / No release yet / (0)

spark-RELIEFFC-fselection

Distributed version of RELIEF-F algorithm for Apache Spark.

@sramirez / Latest release: 0.5.0 (2018-04-09) / Apache-2.0 / (0)

SmartFiltering

Smart Filtering framework for Big Data

@djgarcia / Latest release: 1.0 (2018-04-09) / Apache-2.0 / (2)

SmartReduction

Smart Reduction framework for Big Data

@djgarcia / Latest release: 1.0 (2018-04-09) / Apache-2.0 / (2)

Smart_Imputation

Smart Imputation. k Nearest Neighbor Imputation methods

@JMailloH / Latest release: 1.0 (2018-04-11) / Apache-2.0 / (2)

spark-fits

FITS data source for Spark SQL and DataFrames

@JulienPeloton / No release yet / (1)

sparkTest

i wish I knew what im doing right now

@chamba / No release yet / (1)

spark-tda

Topological Data Analysis Package

@ognis1205 / No release yet / (0)

dac

A Distributed Associative Classifier for Apache Spark MLlib

@lucaventurini / No release yet / (1)

spark-dynamodb

Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.

@audienceproject / No release yet / (0)

Bagging-RandomMiner

Bagging-RandomMiner ensemble method for anomaly detection

@wuicho-pereyra / Latest release: 1.0 (2018-05-22) / Apache-2.0 / (1)

spark-gdelt

Binding the GDELT universe in a Spark environment

@aamend / Latest release: 2.0 (2018-06-02) / Apache-2.0 / (1)

spark-fairness

Calculate fairness metrics using Spark

@eubr-atmosphere / No release yet / (0)

spark-hyperloglog

Algebird's HyperLogLog support for Apache Spark.

@jklukas / Latest release: 2.1.1.1 (2018-06-27) / Apache-2.0 / (0)

waterdrop

An easy-to-use, scalable, bigdata processing tool

@InterestingLab / No release yet / (0)

spark-hyperloglog

Algebird's HyperLogLog support for Apache Spark

@mozilla / Latest release: 2.2.0 (2018-06-29) / Apache-2.0 / (0)

spark-bqs

spark bigquery connector copied from samelamin

@holamap / No release yet / (0)

ParallelTool

Tool design to speed up spark applications

@marino-serna / Latest release: 1.0.1-00 (2018-07-22) / Apache-2.0 / (1)

spark-bigquery

Google BigQuery data source for Apache Spark

@miraisolutions / Latest release: 0.1.1-s_2.11 (2019-06-07) / MIT / (2)

spark-on-k8s-operator

Kubernetes operator for specifying and running Apache Spark applications idiomatically on Kubernetes.

@GoogleCloudPlatform / No release yet / (0)

ammonite-spark

More user-friendly spark-repl via Ammonite

@alexarchambault / No release yet / (0)

sparkMeasure

SparkMeasure is a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.

@LucaCanali / No release yet / (0)

social-network-analysis-community-detection

Implementation of the Batagelj-Zaversnik algorithm

@Jovic92 / No release yet / (0)

TransmogrifAI

Automated machine learning for structured data

@salesforce / Latest release: 0.7.0 (2020-06-12) / BSD 3-Clause / (5)

spark-iforest

Isolation Forest on Spark

@titicaca / Latest release: v2.4.0 (2019-01-02) / Apache-2.0 / (1)

spark-gbtlr

Hybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark

@titicaca / Latest release: v2.4.0 (2019-01-02) / Apache-2.0 / (1)

Bigdata-Governance

Huemul BigDataGovernance, es una libreria que trabaja sobre Spark, Hive y HDFS. Permite la implementacion de una estrategia corporativa de dato unico, basada en buenas practicas de Gobierno de Datos

@HuemulSolutions / No release yet / (1)

huemul-bigdatagovernance

Huemul BigDataGovernance, es una libreria que trabaja sobre Spark, Hive y HDFS

@HuemulSolutions / No release yet / (0)

Equal-Width-Discretizer

Equal Width Discretizer

@djgarcia / Latest release: 1.0 (2018-10-01) / Apache-2.0 / (1)

SparkDensityTree

Adaptive histogram estimation

@TiloWiklund / No release yet / (0)

sarplus

pronounced sUrplus as it's simply better if not best!

@eisber / Latest release: 0.2.6 (2019-05-07) / MIT / (2)

smote-bd

SMOTE-BD: A distributed Synthetic Minority Oversampling Technique (SMOTE) for Big Data.

@majobasgall / Latest release: 0.1 (2018-11-14) / Apache-2.0 / (0)

spark-dirty-cat

Similarity encoding of dirty categorical variables (strings)

@rakutentech / No release yet / (1)

sample_spark

Sample publishing project to spark

@oanhltko / Latest release: 1.0.3 (2018-12-12) / Apache-2.0 / (0)

ExternalValidity

This package contains the code for calculating external clustering validity indices in Spark.

@josemarialuna / No release yet / (0)

spark-select

Spark select enables retrieving only required data from an object

@minio / Latest release: 2.1-s_2.11 (2019-04-04) / Apache-2.0 / (1)

spark-adaptive_filtering

A Spark SQL extension for applying adaptive selection ordering techniques in filtering

@kikniknik / No release yet / (0)

dummy-test-package

@MrBago / No release yet / (0)

lsh-spark

Locality Sensitive Hashing for Apache Spark

@marufaytekin / No release yet / (0)

sparkml-extensions

Extensions for Spark ML/MlLib

@chitralverma / Latest release: 0.1 (2018-12-25) / Apache-2.0 / (1)

HS_FkNN

HS_FkNN: Hybrid Spill Tree Fuzzy k Nearest Neighbors.

@JMailloH / Latest release: 1.0 (2018-12-30) / Apache-2.0 / (1)

spark-ensemble

Ensemble Estimators for Apache Spark ML

@pierrenodet / Latest release: 0.4.0 (2019-02-16) / Apache-2.0 / (1)

kyuubi

Kyuubi is an enhanced editon of Apache Spark's primordial Thrift JDBC/ODBC Server.

@yaooqinn / No release yet / (1)

spark-greenplum

GreenPlum Data Source for Apache SparK

@yaooqinn / No release yet / (1)

spark-utils

Basic framework utilities to quickly start writing production ready Apache Spark applications

@tupol / Latest release: 0.6.1 (2021-10-18) / MIT / (0)

spark-tools

Executable Apache Spark Tools: Format Converter & SQL Processor

@tupol / Latest release: 0.4.1-s_2.11 (2020-09-12) / MIT / (0)

delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark and big data workloads.

@delta-io / Latest release: 0.1.0-s_2.11 (2019-04-24) / Apache-2.0 / (1)

nats-connector-spark-scala

A Scala based Spark Publish/Subscribe NATS Connector

@Logimethods / Latest release: 1.0.0 (2019-06-10) / MIT / (0)

nats-connector-spark

A Spark Publish/Subscribe NATS Connector

@Logimethods / Latest release: 1.0.0 (2019-06-10) / MIT / (0)

spark-client

A spark client for creating tables using the given json schema

@tejeshwr / No release yet / (1)