A community index of third-party packages for Apache Spark.

Showing packages 1 - 25 out of 25 for search "tags:"PySpark""

PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.

@TargetHolding / Latest release: 0.3.5 (2016-03-30) / Apache-2.0 / (1)

  • 1|python
  • 1|spark
  • 1|sql


Pyspark Notebook With Docker.

@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 / (1)

  • 2|python
  • 1|docker
  • 1|pyspark


Enhanced Python Dataframes for Spark

@dondrake / No release yet / (0)

  • 1|python
  • 1|sql
  • 1|pyspark


Spark connector for Ryft ONE

@getryft / Latest release: 0.9.0 (2017-04-04) / other license / (1)

  • 1|search
  • 1|pyspark
  • 1|scala


Create HTML profiling reports from Apache Spark DataFrames

@julioasotodv / Latest release: 1.1.2 (2016-07-26) / Apache-2.0 / (1)

  • 1|tools
  • 1|pyspark


Distributed deep learning with Keras and Apache Spark.

@JoeriHermans / No release yet / (0)

  • 1|machine learning
  • 1|pyspark


This library customizes some DataFrame outputs.

@jeanbaptistepriez / No release yet / (1)

  • 1|pyth
  • 1|jupyter
  • 1|pyspark


launching and controlling spark on hpc clusters made easy

@rokroskar / No release yet / (0)

  • 1|hpc
  • 1|high-performance computing
  • 1|pyspark


Spark DStream connector for MQTT

@apache / Latest release: 2.2.0 (2017-09-09) / Apache-2.0 / (0)

  • 1|python
  • 1|streaming
  • 1|pyspark


Twitter Sentiment Analysis - PySpark

@DayneSorvisto / No release yet / (1)

  • 1|twitter
  • 1|machine learning
  • 1|pyspark


A simple tool for plotting Spark ML's Decision Trees

@julioasotodv / Latest release: 0.2 (2017-03-25) / MIT / (1)

  • 1|machine learning
  • 1|pyspark


Python port of the awesome Datastax Spark Cassandra connector. Compatible w/ Spark 2.0+

@anguenot / Latest release: 2.4.1 (2022-08-03) / Apache-2.0 / (0)

  • 1|python
  • 1|nosql
  • 1|cassandra


Microsoft Machine Learning for Apache Spark

@Azure / Latest release: 0.17 (2019-04-23) / MIT / (4)

  • 3|ml
  • 3|Microsoft
  • 3|machine learning


Optimus is the missing library for cleansing (cleaning and much more) and pre-processing data in a distributed fashion with Apache Spark.

@ironmussa / Latest release: 1.1.0 (2017-10-25) / Apache-2.0 / (2)

  • 1|machine learning
  • 1|tools
  • 1|pyspark


Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.

@Bergvca / Latest release: 0.1.4 (2017-08-02) / Apache-2.0 / (0)

  • 1|python
  • 1|pyspark
  • 1|Histogram


Spark data source and Spark DStream connector for Apache CouchDB/Cloudant

@apache / Latest release: 2.2.0 (2017-09-18) / Apache-2.0 / (0)

  • 1|python
  • 1|streaming
  • 1|sql


Natural Language Processing Library for Apache Spark.

@JohnSnowLabs / Latest release: 3.0.1 (2021-04-02) / Apache-2.0 / (5)

  • 2|NLP
  • 2|machine-learning
  • 2|pyspark


Similarity encoding of dirty categorical variables (strings)

@rakutentech / No release yet / (1)

  • 1|ml
  • 1|machine learning
  • 1|pyspark


The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

@archivesunleashed / Latest release: 0.18.0 (2019-08-21) / Apache-2.0 / (0)

  • 1|pyspark
  • 1|tools
  • 1|Web archives


multi-calss performance matrix aucmu for Apache Spark

@poweihuang / Latest release: 1.0.0 (2019-10-21) / MIT / (1)

  • 1|machine learning
  • 1|pyspark


Large-Scale Multi-View Learning in PySpark

@jpdunc23 / No release yet / (0)

  • 1|machine learning
  • 1|pyspark


An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.

@archivesunleashed / No release yet / (0)

  • 1|pyspark
  • 1|tools
  • 1|Digital Humanities


Online latent state estimation with Spark

@ozancicek / Latest release: 0.3.0 (2020-05-20) / Apache-2.0 / (1)

  • 1|streaming
  • 1|machine learning
  • 1|pyspark


Approximate nearest neighbors search using Hierarchical Navigable Small World graphs

@jelmerk / No release yet / (1)

  • 1|python
  • 1|machine learning
  • 1|pyspark


A library that provides useful extensions to Apache Spark and PySpark.

@G-Research / Latest release: 2.10.0-3.5 (2023-10-07) / Apache-2.0 / (1)

  • 1|core
  • 1|pyspark