A community index of third-party packages for Apache Spark.
Showing packages 351 - 400 out of 517
scalaLDAvis
https://spark-packages.org/package/iaja/scalaLDAvis
@Mageswaran1989 / No release yet / (0)
scalaLDAvis
Scala-Spark port of https://github.com/bmabey/pyLDAvis for Apache Spark LDA Topic Modelling Visualisation
@iaja / No release yet / (0)
prep-buddy
A Scala / Java / Python library for cleansing, transforming and preparing large datasets for ML operations on Apache Spark.
@data-commons / No release yet / (1)
struct-type-encoder
Deriving Spark DataFrame schemas from case classes
@BenFradet / Latest release: 0.2.0 (2018-02-26) / Apache-2.0 / (0)
Twitch-Streamer
Spark Streaming library for reading chat messages from Twitch.tv
@agapic / Latest release: 1.0.0 (2017-07-20) / Apache-2.0 / (1)
Optimus
Optimus is the missing library for cleansing (cleaning and much more) and pre-processing data in a distributed fashion with Apache Spark.
@ironmussa / Latest release: 1.1.0 (2017-10-25) / Apache-2.0 / (2)
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
@isarn / No release yet / (0)
pyspark_dist_explore
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
@Bergvca / Latest release: 0.1.4 (2017-08-02) / Apache-2.0 / (0)
SparkAffinityPropagation
Affinity Propagation on Spark
@viirya / Latest release: 1.0 (2017-07-29) / MIT / (0)
spark-radar
A new scheduler being aware of tasks' size and nodes' capability for spark streaming
@u2009cf / Latest release: 1.0.0 (2017-08-14) / Apache-2.0 / (1)
small-components
Find small connected components in a graph using Apache Spark
@mguaypaq / No release yet / (0)
spark-images
Image schema and ingress support for Spark DataFrames
@Microsoft / Latest release: 0.1 (2017-10-03) / Apache-2.0 / (0)
spark-stringmetric
Spark functions to run popular phonetic and string matching algorithms
@MrPowers / Latest release: 0.2.0 (2019-01-27) / Apache-2.0 / (1)
bahir:sql-cloudant
Spark data source and Spark DStream connector for Apache CouchDB/Cloudant
@apache / Latest release: 2.2.0 (2017-09-18) / Apache-2.0 / (0)
bahir:sql-streaming-akka
Spark structured streaming data source for Akka
@apache / Latest release: 2.2.0 (2017-09-18) / Apache-2.0 / (0)
bahir:streaming-pubsub
Spark DStream connector for Google Pub/Sub
@apache / Latest release: 2.2.0 (2017-09-18) / Apache-2.0 / (0)
spark-kmedoids
Spark implementation of k-medoids clustering algorithm
@tdebatty / Latest release: 0.1.2 (2017-09-24) / MIT / (1)
spark-nlp
Natural Language Processing Library for Apache Spark.
@JohnSnowLabs / Latest release: 3.0.1 (2021-04-02) / Apache-2.0 / (5)
DiReliefF
An Apache Spark package containing a distributed implementation of the classical ReliefF algorithm.
@rauljosepalma / No release yet / (0)
spark-FeatureSelection
Provides different FeatureSelection methods as Spark MLlib PipelineStages and a VectorMerger for merging different VectorColumns without duplicates.
@MarcKaminski / No release yet / (0)
raster-frames
RasterFrames brings the power of Spark DataFrames to geospatial raster data, empowered by the map algebra and tile layer operations of GeoTrellis.
@s22s / No release yet / (0)
spark-sql-server
Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL protocol
@maropu / Latest release: 0.1.7-spark2.4 (2018-11-14) / Apache-2.0 / (0)
Spark-kCore
A spark implementation for k-Core decomposition of a graph
@DMGroup-IUPUI / No release yet / (0)
PhysOnline
PhysOnline: An Open Source Machine Learning Pipeline for Real-Time Analysis of Streaming Physiological Waveform
@rkamaleswaran / No release yet / (1)
spark-highcharts
Support Highcharts for Apache Spark
@knockdata / Latest release: 0.6.5 (2017-12-14) / Apache-2.0 / (0)
spark-authorizer
An Rule of Optimization which provides SQL Standard Authorization for Apache Spark
@yaooqinn / Latest release: 2.1.1 (2018-11-01) / Apache-2.0 / (2)
structured-streaming-application
Structured Streaming is a reference application showing how to easily integrate structured streaming Apache Spark Structured Streaming, Apache Cassandra and Apache Kafka for fast, structured streaming computations on data.
@knoldus / Latest release: 0.1.0 (2018-01-05) / Apache-2.0 / (1)
spark-http-stream
spark structured streaming via HTTP communication
@bluejoe2008 / No release yet / (1)
xgboost-spark-linux64
xgboost-spark pre-built for linux64 environment
@tomasatdatabricks / Latest release: 0.8.3-spark2.3-s_2.11 (2018-05-07) / Apache-2.0 / (0)
DiCFS
An Apache Spark package containing a distributed implementation of the popular CFS algorithm for feature selection.
@rauljosepalma / No release yet / (0)
RandomNoise
RandomNoise: Adds class noise randomly into an RDD
@djgarcia / Latest release: 1.0 (2018-01-30) / Apache-2.0 / (2)
DFST
This package implements DFST (Distributed FastShapelet Transform). DFST is the first time series classification algorithm developed for distributed environments (Spark). This algorithm performs a shapelet transform on a data set, trains a Random Forest mod
@fjbaldan / Latest release: 1.1 (2018-01-31) / Apache-2.0 / (1)
xgboost-linux64
Xgboost Spark package pre-built for linux64 environment
@databricks / Latest release: 0.8.3-spark2.3-s_2.11 (2018-08-15) / Apache-2.0 / (1)
ClusterIndices
This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bouldin and WSSSE indices.
@josemarialuna / No release yet / (0)
apache-hivemall
Apache Hivemall released binaries for Spark Packages
@maropu / Latest release: 0.5.1-spark2.2 (2018-04-04) / Apache-2.0 / (0)
apache-hivemall
Apache Hivemall released binaries for Spark Packages
@apache-hivemall / Latest release: 0.5.1-spark2.2 (2018-04-05) / Apache-2.0 / (0)
net.jgp.labs.informix2spark
Transfer IBM Informix data to Apache Spark using JDBC
@jgperrin / No release yet / (1)