A community index of third-party packages for Apache Spark.

Showing packages 351 - 400 out of 517

https://spark-packages.org/package/iaja/scalaLDAvis

@Mageswaran1989 / No release yet / (0)

  • 1|Spark MLLib
  • 1|LDA
  • 1|Topic Modeling


Scala-Spark port of https://github.com/bmabey/pyLDAvis for Apache Spark LDA Topic Modelling Visualisation

@iaja / No release yet / (0)


A Scala / Java / Python library for cleansing, transforming and preparing large datasets for ML operations on Apache Spark.

@data-commons / No release yet / (1)

  • 1|transforming
  • 1|data-cleaning
  • 1|data-sanitization


Deriving Spark DataFrame schemas from case classes

@BenFradet / Latest release: 0.2.0 (2018-02-26) / Apache-2.0 / (0)

  • 1|sql


Spark GraphX library to detect causalities across time related events

@aamend / Latest release: 1.0 (2017-07-14) / Apache-2.0 / (1)

  • 1|application
  • 1|graph


Spark Streaming library for reading chat messages from Twitch.tv

@agapic / Latest release: 1.0.0 (2017-07-20) / Apache-2.0 / (1)

  • 1|Twitch
  • 1|spark streaming
  • 1|IRC


Optimus is the missing library for cleansing (cleaning and much more) and pre-processing data in a distributed fashion with Apache Spark.

@ironmussa / Latest release: 1.1.0 (2017-10-25) / Apache-2.0 / (2)

  • 1|machine learning
  • 1|tools
  • 1|pyspark


Routines and data structures for using isarn-sketches idiomatically in Apache Spark

@isarn / No release yet / (0)


Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.

@Bergvca / Latest release: 0.1.4 (2017-08-02) / Apache-2.0 / (0)

  • 1|python
  • 1|pyspark
  • 1|Histogram


HSpark - High performance HBase / Spark SQL engine

@bomeng / Latest release: 2.2.0 (2017-07-27) / Apache-2.0 / (0)

  • 1|HSpark
  • 1|spark
  • 1|hbase


MRQAR is a new generic parallel framework to discover quantitative association rules.

@djgarcia / Latest release: 1.0 (2017-07-28) / Apache-2.0 / (2)

  • 1|association rules
  • 1|big data
  • 1|machine learning


Affinity Propagation on Spark

@viirya / Latest release: 1.0 (2017-07-29) / MIT / (0)

  • 1|clustering
  • 1|affinity propagation
  • 1|machine learning


A new scheduler being aware of tasks' size and nodes' capability for spark streaming

@u2009cf / Latest release: 1.0.0 (2017-08-14) / Apache-2.0 / (1)

  • 1|streaming
  • 1|scheduler
  • 1|core


Find small connected components in a graph using Apache Spark

@mguaypaq / No release yet / (0)


Image schema and ingress support for Spark DataFrames

@Microsoft / Latest release: 0.1 (2017-10-03) / Apache-2.0 / (0)


Spark functions to run popular phonetic and string matching algorithms

@MrPowers / Latest release: 0.2.0 (2019-01-27) / Apache-2.0 / (1)


Spark data source and Spark DStream connector for Apache CouchDB/Cloudant

@apache / Latest release: 2.2.0 (2017-09-18) / Apache-2.0 / (0)

  • 1|python
  • 1|streaming
  • 1|sql


Spark structured streaming data source for Akka

@apache / Latest release: 2.2.0 (2017-09-18) / Apache-2.0 / (0)

  • 1|streaming
  • 1|sql
  • 1|scala


Spark DStream connector for Google Pub/Sub

@apache / Latest release: 2.2.0 (2017-09-18) / Apache-2.0 / (0)

  • 1|streaming
  • 1|scala


Spark implementation of k-medoids clustering algorithm

@tdebatty / Latest release: 0.1.2 (2017-09-24) / MIT / (1)

  • 1|clustering
  • 1|machine learning


Test repo to upload to spark packages

@showy / No release yet / (0)


Natural Language Processing Library for Apache Spark.

@JohnSnowLabs / Latest release: 3.0.1 (2021-04-02) / Apache-2.0 / (5)

  • 2|NLP
  • 2|machine-learning
  • 2|pyspark


An Apache Spark package containing a distributed implementation of the classical ReliefF algorithm.

@rauljosepalma / No release yet / (0)


Provides different FeatureSelection methods as Spark MLlib PipelineStages and a VectorMerger for merging different VectorColumns without duplicates.

@MarcKaminski / No release yet / (0)


RasterFrames brings the power of Spark DataFrames to geospatial raster data, empowered by the map algebra and tile layer operations of GeoTrellis.

@s22s / No release yet / (0)

  • 2|geotrellis
  • 2|raster
  • 2|dataframes


Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL protocol

@maropu / Latest release: 0.1.7-spark2.4 (2018-11-14) / Apache-2.0 / (0)


Spatial In-Memory Big data Analytics

@InitialDLab / No release yet / (1)

  • 1|simba
  • 1|DataFrame
  • 1|spatial


Linear Algebra for Graph Algorithms

@IBM / No release yet / (0)


A spark implementation for k-Core decomposition of a graph

@DMGroup-IUPUI / No release yet / (0)


PhysOnline: An Open Source Machine Learning Pipeline for Real-Time Analysis of Streaming Physiological Waveform

@rkamaleswaran / No release yet / (1)

  • 1|machine learning
  • 1|scala
  • 1|real-time


Big data Analsis

@sathishsettu / No release yet / (0)


Support Highcharts for Apache Spark

@knockdata / Latest release: 0.6.5 (2017-12-14) / Apache-2.0 / (0)

  • 1|visualization
  • 1|highcharts


An Rule of Optimization which provides SQL Standard Authorization for Apache Spark

@yaooqinn / Latest release: 2.1.1 (2018-11-01) / Apache-2.0 / (2)

  • 1|hive
  • 1|authorization
  • 1|sql


Structured Streaming is a reference application showing how to easily integrate structured streaming Apache Spark Structured Streaming, Apache Cassandra and Apache Kafka for fast, structured streaming computations on data.

@knoldus / Latest release: 0.1.0 (2018-01-05) / Apache-2.0 / (1)

  • 1|application
  • 1|structured-streaming
  • 1|scala


spark structured streaming via HTTP communication

@bluejoe2008 / No release yet / (1)

  • 1|spark streaming
  • 1|http


Fuzzy matching function in spark

@itspawanbhardwaj / No release yet / (1)

  • 1|fuzzymatching
  • 1|spark
  • 1|fuzzy_matching


xgboost-spark pre-built for linux64 environment

@tomasatdatabricks / Latest release: 0.8.3-spark2.3-s_2.11 (2018-05-07) / Apache-2.0 / (0)


An Apache Spark package containing a distributed implementation of the popular CFS algorithm for feature selection.

@rauljosepalma / No release yet / (0)


RD2R Ensemble

@djgarcia / Latest release: 1.0 (2018-01-29) / Apache-2.0 / (2)

  • 1|mllib
  • 1|big data
  • 1|ensemble


RandomNoise: Adds class noise randomly into an RDD

@djgarcia / Latest release: 1.0 (2018-01-30) / Apache-2.0 / (2)

  • 1|noise
  • 1|Preprocessing
  • 1|big data


This package implements DFST (Distributed FastShapelet Transform). DFST is the first time series classification algorithm developed for distributed environments (Spark). This algorithm performs a shapelet transform on a data set, trains a Random Forest mod

@fjbaldan / Latest release: 1.1 (2018-01-31) / Apache-2.0 / (1)

  • 1|big data
  • 1|classification
  • 1|Time series


Xgboost Spark package pre-built for linux64 environment

@databricks / Latest release: 0.8.3-spark2.3-s_2.11 (2018-08-15) / Apache-2.0 / (1)


Sparkoscope

@ibm-research-ireland / No release yet / (1)


Massively Distributed Indexing of Time Series

@lev-a / No release yet / (0)


This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bouldin and WSSSE indices.

@josemarialuna / No release yet / (0)


GDELT universe from a Spark environment

@aamend / No release yet / (0)

  • 1|graph
  • 1|sql
  • 1|NLP


Apache Hivemall released binaries for Spark Packages

@maropu / Latest release: 0.5.1-spark2.2 (2018-04-04) / Apache-2.0 / (0)


Apache Hivemall released binaries for Spark Packages

@apache-hivemall / Latest release: 0.5.1-spark2.2 (2018-04-05) / Apache-2.0 / (0)


Qubole Sparklens tool for performance tuning Apache Spark

@qubole / Latest release: 0.3.2-s_2.11 (2020-05-04) / Apache-2.0 / (2)


Transfer IBM Informix data to Apache Spark using JDBC

@jgperrin / No release yet / (1)

  • 1|java
  • 1|example
  • 1|tutorial