Generic Connector for Apache Spark
@alvsanand / (1)
This library simplifies the connection of a external system with Apache Spark. Its main idea is to use a core functionality that is responsible of working with Apache Spark and implement specific connectors for any system. It can be used in batch or streaming scenarios which is awesome. From the first time, the idea is to be a _read only_ connector library. So any write operations will not be implemented.
Nowadays, it has implemented the following connectors:
- CloudStorageSgcConnector: is able to fetch files from Google Cloud Storage.
- DataTransferSgcConnector: is able to fetch files from DoubleClick Data Transfer.
FTP servers like:
- FTPSgcConnector: is able to fetch files from a FTP server.
- FTPSSgcConnector: is able to fetch files from a FTPS server
- SFTPSgcConnector: is able to fetch files from a SFTP server
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages alvsanand:spark-generic-connector:0.2.0-spark_2x-s_2.11
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "alvsanand/spark-generic-connector:0.2.0-spark_2x-s_2.11"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "alvsanand" % "spark-generic-connector" % "0.2.0-spark_2x-s_2.11"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>alvsanand</groupId> <artifactId>spark-generic-connector</artifactId> <version>0.2.0-spark_2x-s_2.11</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>