A community index of third-party packages for Apache Spark.

Showing packages 1 - 17 out of 17 for search "tags:"PySpark""

PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.

@TargetHolding / Latest release: 0.3.5 (2016-03-30) / Apache-2.0 / (1)

  • 1|python
  • 1|spark
  • 1|sql


Pyspark Notebook With Docker.

@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 / (1)

  • 2|python
  • 1|docker
  • 1|pyspark


Enhanced Python Dataframes for Spark

@dondrake / No release yet / (0)

  • 1|python
  • 1|sql
  • 1|pyspark


Spark connector for Ryft ONE

@getryft / Latest release: 0.9.0 (2017-04-04) / other license / (1)

  • 1|search
  • 1|pyspark
  • 1|scala


Create HTML profiling reports from Apache Spark DataFrames

@julioasotodv / Latest release: 1.1.2 (2016-07-26) / Apache-2.0 / (1)

  • 1|tools
  • 1|pyspark


Distributed deep learning with Keras and Apache Spark.

@JoeriHermans / No release yet / (0)

  • 1|machine learning
  • 1|pyspark


This library customizes some DataFrame outputs.

@jeanbaptistepriez / No release yet / (1)

  • 1|pyth
  • 1|jupyter
  • 1|pyspark


launching and controlling spark on hpc clusters made easy

@rokroskar / No release yet / (0)

  • 1|hpc
  • 1|high-performance computing
  • 1|pyspark


Spark DStream connector for MQTT

@apache / Latest release: 2.2.0 (2017-09-09) / Apache-2.0 / (0)

  • 1|python
  • 1|streaming
  • 1|pyspark


Twitter Sentiment Analysis - PySpark

@DayneSorvisto / No release yet / (1)

  • 1|twitter
  • 1|machine learning
  • 1|pyspark


A simple tool for plotting Spark ML's Decision Trees

@julioasotodv / Latest release: 0.2 (2017-03-25) / MIT / (1)

  • 1|machine learning
  • 1|pyspark


Python port of the awesome Datastax Spark Cassandra connector. Compatible w/ Spark 2.0+

@anguenot / Latest release: 0.6.0 (2017-10-05) / Apache-2.0 / (0)

  • 1|python
  • 1|nosql
  • 1|cassandra


Microsoft Machine Learning for Apache Spark

@Azure / Latest release: 0.9 (2017-10-14) / MIT / (3)

  • 3|ml
  • 3|Microsoft
  • 3|machine learning


Optimus is the missing library for cleansing (cleaning and much more) and pre-processing data in a distributed fashion with Apache Spark.

@ironmussa / Latest release: 1.0.3 (2017-10-03) / Apache-2.0 / (2)

  • 1|machine learning
  • 1|tools
  • 1|pyspark


Data Exploration in PySpark made easy - Pyspark_dist_explore provides methods to get fast insights in your Spark DataFrames.

@Bergvca / Latest release: 0.1.4 (2017-08-02) / Apache-2.0 / (0)

  • 1|python
  • 1|pyspark
  • 1|Histogram


Spark data source and Spark DStream connector for Apache CouchDB/Cloudant

@apache / Latest release: 2.2.0 (2017-09-18) / Apache-2.0 / (0)

  • 1|python
  • 1|streaming
  • 1|sql


Natural Language Processing Library for Apache Spark.

@JohnSnowLabs / Latest release: 1.2.0 (2017-10-17) / Apache-2.0 / (2)

  • 1|NLP
  • 1|machine-learning
  • 1|pyspark