A community index of third-party packages for Apache Spark.
Showing packages 51 - 100 out of 515
MLlib-dropout
Package adding dropout regularization to Apache Spark MLlib project
@rakeshchalasani / No release yet / (1)
spark-infotheoretic-feature-selection
Feature Selection framework based on Information Theory that includes: mRMR, InfoGain, JMI and other commonly used FS filters.
@sramirez / Latest release: 1.4.4 (2017-09-25) / Apache-2.0 / (8)
spark-MDLP-discretization
Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)
@sramirez / Latest release: 1.4.1 (2017-09-25) / Apache-2.0 / (7)
spark-ml-class
Coursera Machine Learning class examples in Spark
@zinniasystems / No release yet / (0)
spark-archetype-scala
Maven archetype used to bootstrap a Spark Scala project
@mbonaci / Latest release: 0.9 (2015-04-24) / MIT / (0)
couchbase-spark-connector
Deprecated, please see couchbase/couchbase-spark-connector
@couchbaselabs / Latest release: 1.0.0 (2015-10-20) / Apache-2.0 / (1)
spark-pmml-exporter-validator
Using JPMML Evaluator to validate the PMML models exported from Spark
@selvinsource / No release yet / (1)
spark-sas7bdat
Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL
@saurfang / Latest release: 3.0.0-s_2.12 (2020-09-13) / Apache-2.0 / (1)
spark-solr
Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
@LucidWorks / Latest release: 2.0.1 (2016-06-09) / Apache-2.0 / (1)
RabbitMQ-Receiver
RabbitMQ Spark Streaming receiver
@Stratio / Latest release: 0.4.0 (2016-12-20) / Apache-2.0 / (10)
streaming-matrix-factorization
Streaming Recommendation Engine using matrix factorization with user and product bias
@brkyvz / Latest release: 0.1.0 (2015-05-26) / Apache-2.0 / (2)
pyspark-elastic
Pyspark support for Elastic Search
@TargetHolding / Latest release: 0.4.2 (2016-03-22) / Apache-2.0 / (1)
dissolve-struct
Distributed solver library for large-scale structured output prediction
@dalab / No release yet / (0)
DDF
Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data/Compute Engine
@ddf-project / No release yet / (11)
spark-datetime
A library for exposing dateTime functions from the joda library as SQL functions. With a dsl to build dateTime catalyst expressions.
@SparklineData / Latest release: 0.0.2 (2015-10-29) / Apache-2.0 / (1)
adam
A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.
@bigdatagenomics / No release yet / (1)
spark-deployer
Deploy Spark cluster in an easy way.
@pishen / Latest release: 0.5.1 (2015-06-25) / Apache-2.0 / (0)
sparkboost
A distributed implementation of AdaBoost.MH and MP-Boost using Apache Spark
@tizfa / Latest release: 0.6 (2015-07-01) / Apache-2.0 / (0)
hivemall-spark
A Hivemall wrapper for Spark
@maropu / Latest release: 0.0.6 (2016-04-07) / Apache-2.0 / (0)
elasticsearch-hadoop
Official integration between Apache Spark and Elasticsearch real-time search and analytics
@elastic / Latest release: 5.3.1 (2017-04-21) / Apache-2.0 / (3)
patchwork
Highly Scalable Grid-Density Clustering Algorithm for Spark MLLib
@thomastriplet / No release yet / (0)
jaws-spark-sql-rest
Restful service for running Spark SQL/Shark queries on top of Spark, with Mesos and Tachyon support.
@Atigeo / No release yet / (0)
spark-job-rest
Restful service that enables support for multiple spark contexts created from the same server.
@Atigeo / No release yet / (0)
modelmatrix
Alternative to Spark machine learning pipeline feature extractors, focused on building sparse feature vectors.
@collectivemedia / No release yet / (1)
spark-knn-graphs
Spark algorithms for building and processing k-nn graphs
@tdebatty / Latest release: 0.13 (2016-02-17) / MIT / (1)
Spark-SQL-on-HBase
Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces
@Huawei-Spark / Latest release: 1.0.0 (2015-07-17) / Apache-2.0 / (1)
xpatterns-xframe
Simplified tabular data processing library for Spark
@Atigeo / No release yet / (0)
magellan
Geo Spatial Data Analytics on Spark
@harsha2010 / Latest release: 1.0.5-s_2.11 (2017-08-14) / Apache-2.0 / (1)
SparkTwitterAnalysis
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build(SBT) for building the project.
@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 / (1)
pyspark-notebook
Pyspark Notebook With Docker.
@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 / (1)
spark-json-relay
SparkListener that converts SparkListenerEvents to JSON and forwards them to an external service via RPC.
@hammerlab / Latest release: 2.0.1 (2015-10-12) / Apache-2.0 / (0)
spark-streaming-gnip
An Apache Spark utility for pulling Tweets from Gnip's PowerTrack in realtime
@knoldus / No release yet / (1)
spark-on-hbase
Generic solution for scanning, joining and mutating HBase tables to and from the Spark RDDs.
@michal-harish / No release yet / (0)
spark-salesforce
Spark Salesforce Wave Connector
@springml / Latest release: 1.2.0 (2018-04-25) / Apache-2.0 / (2)
spark-centrality
Library for computing centrality for graph nodes
@webgeist / Latest release: 0.11 (2015-08-09) / LGPL-3.0 / (3)