A community index of third-party packages for Apache Spark.

Showing packages 451 - 500 out of 517

Iterative Ensemble Noise Filter for Big Data

@djgarcia / Latest release: 1.0 (2019-06-10) / Apache-2.0 / (1)

  • 1|machine learning
  • 1|big data
  • 1|data mining


Stream Data Mining Library for Spark Streaming

@huawei-noah / Latest release: 0.0.1 (2019-07-21) / Apache-2.0 / (1)

  • 1|streaming
  • 1|machine learning
  • 1|scala


The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

@archivesunleashed / Latest release: 0.18.0 (2019-08-21) / Apache-2.0 / (0)

  • 1|pyspark
  • 1|tools
  • 1|Web archives


Testing spark-packages publishing, please ignore

@hmgomes / Latest release: 0.0.2 (2019-07-21) / Apache-2.0 / (0)


Hive ACID datasource for Apache Spark

@qubole / Latest release: 0.4.0-s_2.11 (2019-07-26) / Apache-2.0 / (0)

  • 1|spark
  • 1|data source
  • 1|DataSource


mllib-stacking-bagging is an implementation of the ensemble methods stacking and bagging using the classifiers LogisticRegression, NaiveBayes and DecisionTree provided by the RDD-based api of mllib.

@Pse00004 / No release yet / (0)


P-spectrum embedding and sequence relaxation for NLP in Spark

@sirCamp / Latest release: 1.0.0 (2019-08-07) / Apache-2.0 / (0)

  • 1|ml
  • 1|spark
  • 1|machine learning


Spark-Scala Implementaion of Means

@mfleming99 / No release yet / (0)


Spark Library for Bulk Loading into Cassandra

@joswlv / No release yet / (1)

  • 1|bulkload
  • 1|sstable
  • 1|cassandra


Extension to the standard K-Means implementation of Spark ML library

@tupol / Latest release: 0.0.1 (2019-09-04) / MIT / (0)

  • 1|ml
  • 1|kmeans
  • 1|anomalies


Model Hopper Parallelism (MOP) for Efficient and Reproducible Model Selection on Apache Spark

@scnakandala / No release yet / (0)


Enabling continuous delivery and improvement of Spark pipeline models through devops methodology and ML governance

@aamend / Latest release: 1.1 (2019-10-17) / Apache-2.0 / (1)


Complexity metrics for big data problems.

@JMailloH / Latest release: 1.0 (2019-10-17) / Apache-2.0 / (1)

  • 1|ml
  • 1|mllib
  • 1|machine learning


multi-calss performance matrix aucmu for Apache Spark

@poweihuang / Latest release: 1.0.0 (2019-10-21) / MIT / (1)

  • 1|machine learning
  • 1|pyspark


JDBC source for spark structured streaming

@sutugin / No release yet / (1)

  • 1|jdbc
  • 1|data source
  • 1|sql


Large-Scale Multi-View Learning in PySpark

@jpdunc23 / No release yet / (0)

  • 1|machine learning
  • 1|pyspark


A Scala memoization library ready for Spark

@EnzoBnl / No release yet / (1)


Utility classes to extend and generalize Spark's ML pipeline framework

@tnixon / No release yet / (0)

  • 1|pipelines
  • 1|etl
  • 1|machine learning


Add PySpark support for reading AWS DynamoDB streams

@kberbic / No release yet / (0)


DynamoDB source for Spark Structured Streaming

@kolia1985 / Latest release: 0.0.2 (2019-11-24) / Apache-2.0 / (1)

  • 1|streaming
  • 1|data source


SD_DeTE is a novel Smart Data driven Decision Trees Ensemble methodology for addressing the imbalanced classification problem in Big Data domains

@djgarcia / Latest release: 1.0 (2019-12-03) / Apache-2.0 / (1)

  • 1|machine learning
  • 1|big data
  • 1|data mining


An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.

@archivesunleashed / No release yet / (0)

  • 1|pyspark
  • 1|tools
  • 1|Digital Humanities


Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines

@qubole / Latest release: 0.5.3 (2020-02-13) / Apache-2.0 / (0)

  • 1|streaming
  • 1|sql
  • 1|scala


A library to read dicom file in a spark sql data frame.

@abzoobabd / No release yet / (1)


Online latent state estimation with Spark

@ozancicek / Latest release: 0.3.0 (2020-05-20) / Apache-2.0 / (1)

  • 1|streaming
  • 1|machine learning
  • 1|pyspark


Simplified consistent minimalistic layer over Apache Spark

@music-of-the-ainur / No release yet / (0)


Simplifies the task to parser/flatten complex semi-structured data

@music-of-the-ainur / No release yet / (0)


Custom state store providers for Apache Spark

@chermenin / Latest release: 0.2 (2020-04-24) / Apache-2.0 / (0)

  • 1|stateful
  • 1|streaming


Structured Streaming State Tools for Apache Spark

@HeartSaVioR / Latest release: 0.3.0 (2020-05-21) / Apache-2.0 / (0)

  • 1|structured streaming
  • 1|state
  • 1|data source


Kafka offset committer for Apache Spark structured streaming query

@HeartSaVioR / No release yet / (0)


Ensembles Development

@amm00254 / No release yet / (0)


A privacy preserving library for Apache Spark

@ThaminduR / No release yet / (0)

  • 1|anonymization
  • 1|k-anonymity
  • 1|l-diversity


Query deeply nested and huge directories from Spark efficiently

@salva / Latest release: 0.10 (2020-06-17) / Apache-2.0 / (1)

  • 1|file system
  • 1|glob
  • 1|find


Apache Spark ETL Utilities

@mayur2810 / No release yet / (0)


Spark access to Common Information Model (CIM) files

@derrickoswald / No release yet / (0)


Kotlin language bindings and several extensions for Apache Spark

@JetBrains / No release yet / (1)


Scala and Spark library focused on reading OpenStreetMap Pbf files.

@simplexspatial / Latest release: 1.0.7 (2021-03-27) / MIT / (0)


Officially supported, Apache 2 licensend Neo4j Connector for Apache Spark.

@neo4j-contrib / Latest release: 4.0.1 (2021-04-12) / Apache-2.0 / (0)


Officially supported, Apache 2 licensend Neo4j Connector for Apache Spark.

@neo4j-contrib / Latest release: 4.0.1_for_spark_3 (2021-04-12) / Apache-2.0 / (0)


moved to https://spark-packages.org/package/mjuez/approx-smote

@mjuez / Latest release: 0.3 (2020-11-13) / Apache-2.0 / (0)


Approximated SMOTE for Big Data under the Spark Framework.

@mjuez / Latest release: 1.1.2 (2022-04-27) / Apache-2.0 / (1)

  • 1|ml
  • 1|big data
  • 1|data mining


Spark-based graph processing system demo using GPU

@Kamosphere / Latest release: 1.1 (2020-12-22) / Apache-2.0 / (3)

  • 1|graph
  • 1|CUDA
  • 1|GPU


AnomalyDSD is a Spark Package composed of four Big Data Anomaly Dynamic and Static Detection Algorithms

@ari-dasci / Latest release: 1.0 (2021-02-17) / Apache-2.0 / (1)

  • 1|big data
  • 1|anomaly detection


Sequence Data Source for Apache Spark

@garawalid / Latest release: 0.2.0 (2021-03-20) / Apache-2.0 / (0)

  • 1|input
  • 1|data source
  • 1|sql


Rotation Forest implementation for Big Data on Apache Spark

@mjuez / Latest release: 1.0.0 (2021-03-23) / Apache-2.0 / (1)

  • 1|ml
  • 1|ensemble
  • 1|big data


Test for spark-packages

@bozhang2820 / Latest release: 0.0.8 (2023-09-25) / Apache-2.0 / (0)


spark packages test

@linhongliu-db / Latest release: 0.0.7 (2021-04-30) / Apache-2.0 / (0)


SnappyData: OLTP + OLAP Database built on Apache Spark

@TIBCOSoftware / No release yet / (1)


Relational query engine that unites SparkSQL and GORpipe into a single declarative query framework.

@gorpipe / Latest release: 3.10.2 (2021-05-09) / Apache-2.0 / (0)


Rumble: JSONiq for Apache Spark

@RumbleDB / No release yet / (1)

  • 1|Applications
  • 1|tools
  • 1|nosql