A community index of third-party packages for Apache Spark.
Showing packages 1 - 25 out of 25 for search "tags:"PySpark""
pyspark-cassandra
PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.
@TargetHolding / Latest release: 0.3.5 (2016-03-30) / Apache-2.0 / (1)
pyspark-notebook
Pyspark Notebook With Docker.
@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 / (1)
spark-ryft-connector
Spark connector for Ryft ONE
@getryft / Latest release: 0.9.0 (2017-04-04) / other license / (1)
spark-df-profiling
Create HTML profiling reports from Apache Spark DataFrames
@julioasotodv / Latest release: 1.1.2 (2016-07-26) / Apache-2.0 / (1)
dist-keras
Distributed deep learning with Keras and Apache Spark.
@JoeriHermans / No release yet / (0)
spark_jupyter
This library customizes some DataFrame outputs.
@jeanbaptistepriez / No release yet / (1)
sparkhpc
launching and controlling spark on hpc clusters made easy
@rokroskar / No release yet / (0)
bahir:streaming-mqtt
Spark DStream connector for MQTT
@apache / Latest release: 2.2.0 (2017-09-09) / Apache-2.0 / (0)
Twitter-Sentiment-Analyzer
Twitter Sentiment Analysis - PySpark
@DayneSorvisto / No release yet / (1)
spark-tree-plotting
A simple tool for plotting Spark ML's Decision Trees
@julioasotodv / Latest release: 0.2 (2017-03-25) / MIT / (1)
pyspark-cassandra
Python port of the awesome Datastax Spark Cassandra connector. Compatible w/ Spark 2.0+
@anguenot / Latest release: 2.4.1 (2022-08-03) / Apache-2.0 / (0)
Optimus
Optimus is the missing library for cleansing (cleaning and much more) and pre-processing data in a distributed fashion with Apache Spark.
@ironmussa / Latest release: 1.1.0 (2017-10-25) / Apache-2.0 / (2)
pyspark_dist_explore
Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.
@Bergvca / Latest release: 0.1.4 (2017-08-02) / Apache-2.0 / (0)
bahir:sql-cloudant
Spark data source and Spark DStream connector for Apache CouchDB/Cloudant
@apache / Latest release: 2.2.0 (2017-09-18) / Apache-2.0 / (0)
spark-nlp
Natural Language Processing Library for Apache Spark.
@JohnSnowLabs / Latest release: 3.0.1 (2021-04-02) / Apache-2.0 / (5)
spark-dirty-cat
Similarity encoding of dirty categorical variables (strings)
@rakutentech / No release yet / (1)
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
@archivesunleashed / Latest release: 0.18.0 (2019-08-21) / Apache-2.0 / (0)
spark-aucmu
multi-calss performance matrix aucmu for Apache Spark
@poweihuang / Latest release: 1.0.0 (2019-10-21) / MIT / (1)
twut
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
@archivesunleashed / No release yet / (0)
artan
Online latent state estimation with Spark
@ozancicek / Latest release: 0.3.0 (2020-05-20) / Apache-2.0 / (1)
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
@G-Research / Latest release: 2.10.0-3.5 (2023-10-07) / Apache-2.0 / (1)