SparkSQL extension as a library for Apache Spark extending and improving its capabilities for a data federation system.
@Stratio / (6)
Crossdata is an extension of SparkSQL through the XDContext and acting as a data federation system. It also provides extended capabilities for some datasources such as Cassandra (Datastax), MongoDB (Stratio) and Elasticsearch (elastic). Moreover, it extends the SparkSQL language with custom SQL-like sentences to add metadata discovery operations and creation of external tables. These extensions also allow to resolve queries natively, that is, Crossdata goes directly to the persistence layer. This native access speed-ups some queries and avoids the usage of the Spark cluster resources, relieving the pressure of the memory when possible. In addition, Crossdata supports batch and streaming processing so that you can mix data from both incoming origins using, again, a SQL-like language. These extensions of SparkSQL also includes persistent metadata and logical views.
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages Stratio:spark-crossdata:1.4.0
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "Stratio/spark-crossdata:1.4.0"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "Stratio" % "spark-crossdata" % "1.4.0"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>Stratio</groupId> <artifactId>spark-crossdata</artifactId> <version>1.4.0</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>