A community index of third-party packages for Apache Spark.

Showing packages 1 - 50 out of 62 for search "tags:"Data Sources""

Integration utilities for using Spark with Apache Avro data

@databricks / Latest release: 4.0.0-s_2.11 (2017-10-30) / Apache-2.0 / (13)

  • 6|sql
  • 4|input
  • 4|avro


Redshift Data Source for Apache Spark

@databricks / Latest release: 3.0.0-preview1 (2016-11-01) / Apache-2.0 / (3)

  • 2|sql
  • 2|data source
  • 2|redshift


Spark SQL CSV data source

@databricks / Latest release: 1.5.0-s_2.11 (2016-09-07) / Apache-2.0 / (10)

  • 4|csv
  • 3|sql
  • 2|DataSource


Connecting Apache Spark with different data stores

@Stratio / Latest release: 0.7.0-RC1 (2015-01-14) / Apache-2.0 / (20)

  • 6|database
  • 6|mongo
  • 6|cassandra


An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parses csv data into SchemaRDD. No installation required, simply include pyspark_csv.py via SparkContext.

@seahboonsiew / No release yet / (1)

  • 2|python
  • 2|csv
  • 1|sql


MongoDB data source for Spark SQL

@Stratio / Latest release: 0.12.0 (2016-08-31) / Apache-2.0 / (14)

  • 5|MongoDB
  • 5|Spark SQL
  • 2|sql


PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.

@TargetHolding / Latest release: 0.3.5 (2016-03-30) / Apache-2.0 / (1)

  • 1|python
  • 1|spark
  • 1|sql


Connects Spark to Cassandra

@datastax / Latest release: 2.4.0-s_2.11 (2018-11-29) / Apache-2.0 / (14)

  • 3|spark
  • 3|cassandra
  • 2|nosql


Power BI API adapter for Apache Spark

@granturing / Latest release: 1.5.0_0.0.7 (2015-09-13) / Apache-2.0 / (0)

  • 2|streaming
  • 1|sql
  • 1|realtime


Spark connector for SequoiaDB

@SequoiaDB / Latest release: 1.12-s_2.11 (2015-03-30) / Apache-2.0 / (2)

  • 2|sequoiadb
  • 2|nosql
  • 2|sql


Spark SQL IBM Cloudant External Datasource

@cloudant / No release yet / (1)

  • 1|data source
  • 1|sql


Deprecated, please see couchbase/couchbase-spark-connector

@couchbaselabs / Latest release: 1.0.0 (2015-10-20) / Apache-2.0 / (1)

  • 1|streaming
  • 1|library
  • 1|sql


Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL

@saurfang / Latest release: 3.0.0-s_2.12 (2020-09-13) / Apache-2.0 / (1)

  • 1|sas
  • 1|tools
  • 1|sql


Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.

@LucidWorks / Latest release: 2.0.1 (2016-06-09) / Apache-2.0 / (1)

  • 1|ml
  • 1|data sources
  • 1|solr


Spark and Spark SQL integration for Succinct

@amplab / Latest release: 0.1.8 (2019-07-10) / Apache-2.0 / (1)

  • 1|application
  • 1|data source


Pyspark support for Elastic Search

@TargetHolding / Latest release: 0.4.2 (2016-03-22) / Apache-2.0 / (1)

  • 1|python
  • 1|spark
  • 1|database


Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data/Compute Engine

@ddf-project / No release yet / (11)

  • 3|API
  • 2|tools
  • 2|machine learning


Official integration between Apache Spark and Elasticsearch real-time search and analytics

@elastic / Latest release: 5.3.1 (2017-04-21) / Apache-2.0 / (3)

  • 1|search
  • 1|elasticsearch
  • 1|sql


Geo Spatial Data Analytics on Spark

@harsha2010 / Latest release: 1.0.5-s_2.11 (2017-08-14) / Apache-2.0 / (1)

  • 2|geospatial
  • 2|data source
  • 2|sql


An Apache Spark utility for pulling Tweets from Gnip's PowerTrack in realtime

@knoldus / No release yet / (1)

  • 1|streaming
  • 1|data source
  • 1|scala


Spark Salesforce Wave Connector

@springml / Latest release: 1.2.0 (2018-04-25) / Apache-2.0 / (2)

  • 1|salesforce
  • 1|data source


Infinispan Spark Connector

@infinispan / Latest release: 0.9 (2018-11-05) / Apache-2.0 / (0)

  • 1|streaming
  • 1|sql
  • 1|scala


Read SparkSQL parquet file as RDD[Protobuf]

@saurfang / Latest release: 0.1.2-s_2.10 (2015-08-18) / Apache-2.0 / (0)

  • 1|data source
  • 1|protobuf
  • 1|sql


Spark on Aliyun, supporting interactions with Aliyun's base services.

@aliyun / No release yet / (1)

  • 1|streaming
  • 1|data source


Spark mainframe connector

@Syncsort / Latest release: 1.0.0 (2015-09-01) / Apache-2.0 / (0)

  • 1|input
  • 1|data source
  • 1|sql


Docker-based, End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark Streaming, ML, MLlib, GraphX, Kafka, Cassandra, Redis, Apache Zeppelin, Spark-Notebook, iPython/Jupyter Notebook, Tableau, H2O Flow, Tachyon,

@fluxcapacitor / No release yet / (3)

  • 2|streaming
  • 2|kafka
  • 1|machine learning


XML data source for Spark SQL and DataFrames

@HyukjinKwon / Latest release: 0.1.1-s_2.10 (2015-11-19) / Apache-2.0 / (1)

  • 1|sql
  • 1|DataSource
  • 1|SparkSQL


Popular ML Datasets for Spark ML (MNIST, IRIS, CIFAR)

@cookieai / Latest release: 0.1.0 (2015-12-22) / Apache-2.0 / (0)

  • 1|data source
  • 1|machine learning


Google Spreadsheets datasource for SparkSQL and DataFrames

@potix2 / Latest release: 0.6.3-s_2.11 (2019-08-21) / Apache-2.0 / (1)

  • 1|sql
  • 1|data source
  • 1|scala


Spark Package to read and write PLY, LAS and XYZ lidar point clouds using Spark SQL.

@IGNF / Latest release: 0.1.0-s_2.10 (2015-12-08) / Apache-2.0 / (0)

  • 1|geospatial
  • 1|data source
  • 1|sql


Spark connector for Ryft ONE

@getryft / Latest release: 0.9.0 (2017-04-04) / other license / (1)

  • 1|search
  • 1|pyspark
  • 1|scala


Spark connector for SFTP

@springml / Latest release: 1.1.3 (2018-10-01) / Apache-2.0 / (2)

  • 1|data source


Data source for querying SPARQL endpoints

@USU-Research / Latest release: 1.0.0-beta1-s_2.10 (2016-01-27) / Apache-2.0 / (0)

  • 1|data source
  • 1|sparql
  • 1|sql


NetFlow data source for Spark SQL and DataFrames

@sadikovi / Latest release: 2.1.0-s_2.12 (2020-12-24) / Apache-2.0 / (2)

  • 1|input
  • 1|library
  • 1|sql


Spark uploader for S3

@knoldus / No release yet / (1)

  • 2|data source
  • 1|scala


The Official Couchbase Spark Connector

@couchbase / Latest release: 2.2.0 (2017-09-20) / Apache-2.0 / (2)

  • 1|streaming
  • 1|library
  • 1|sql


SnappyData: OLTP + OLAP Database built on Apache Spark

@SnappyDataInc / Latest release: 1.2.0-s_2.11 (2020-02-07) / Apache-2.0 / (4)

  • 2|database
  • 1|data source
  • 1|sql


Connects Spark to Hazelcast

@erenavsarogullari / Latest release: 1.0.0-s_2.11 (2016-03-07) / Apache-2.0 / (0)

  • 1|streaming
  • 1|spark
  • 1|scala


High performing connector to object storage for Apache Spark.  Supports IBM Cloud Object Storage and OpenStack Swift

@SparkTC / Latest release: 1.1.4 (2021-12-07) / Apache-2.0 / (1)

  • 1|data source
  • 1|Swift
  • 1|data s


The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV

@basho / Latest release: 1.6.3 (2017-03-17) / Apache-2.0 / (2)

  • 3|python
  • 3|riak
  • 3|data source


Google BigQuery support for Spark, SQL, and DataFrames

@spotify / Latest release: 0.2.2-s_2.10 (2017-11-29) / Apache-2.0 / (3)

  • 1|input
  • 1|data source
  • 1|sql


Officially supported, Apache 2 licensed Neo4j Connector for Apache Spark.

@neo4j-contrib / Latest release: 5.3.0 (2024-04-18) / Apache-2.0 / (1)

  • 1|graph
  • 1|data source
  • 1|database


Spark Receiver for SQL or NoSQL Databases like Cassandra, MongoDB, Elasticsearch or JDBC

@Stratio / Latest release: 0.1.0 (2016-06-30) / Apache-2.0 / (1)

  • 1|streaming
  • 1|library
  • 1|sql


Write your RDDs and DStreams to Kafka seamlessly

@BenFradet / Latest release: 0.4.0 (2017-07-22) / Apache-2.0 / (0)

  • 1|streaming
  • 1|data source


Apache Spark datasource for OrientDB

@sbcd90 / No release yet / (1)

  • 1|orientdb
  • 1|spark datasource


Spark SQL datasource for GitHub PR API

@lightcopy / Latest release: 1.3.0-s_2.10 (2016-12-25) / Apache-2.0 / (0)

  • 1|input
  • 1|library
  • 1|sql


A Spark datasource for the HadoopCryptoLedger library

@ZuInnoTe / Latest release: 1.3.2-s_2.12 (2021-12-24) / Apache-2.0 / (1)

  • 1|hadoocryptoledger
  • 1|data source
  • 1|bitcoin


A Spark datasource for the HadoopOffice library

@ZuInnoTe / Latest release: 1.7.0-s_2.13 (2022-10-29) / Apache-2.0 / (1)

  • 1|data source
  • 1|excel
  • 1|office


Generic Connector for Apache Spark

@alvsanand / Latest release: 0.2.0-spark_2x-s_2.11 (2017-01-17) / Apache-2.0 / (1)

  • 1|streaming
  • 1|data source
  • 1|Google Cloud


Spark Tensorflow Connector

@tapanalyticstoolkit / Latest release: 1.0.0-s_2.11 (2017-02-21) / Apache-2.0 / (3)

  • 2|tensorflow
  • 2|data source
  • 1|library