An Rule of Optimization which provides SQL Standard Authorization for Apache Spark
@yaooqinn / (2)
Spark Authorizer provides you with SQL Standard Based Authorization for Apache Spark like SQL Standard Based Hive Authorization. While you are using Spark SQL or Dataset/DataFrame API to load data from tables embedded with Apache Hive metastore, this library provides row/column level fine-grained access controls with Apache Ranger.
Security is one of fundamental features for enterprise adoption. Apache Ranger offers many security plugins for many Hadoop ecosystem components, such as HDFS, Hive, HBase, Solr and Sqoop2. However, Apache Spark is not counted in yet. When a secured HDFS cluster is used as a data warehouse accessed by various users and groups via different applications wrote by Spark and Hive, it is very difficult to guarantee data management in a consistent way. Apache Spark users visit data warehouse only with Storage based access controls offered by HDFS. This library shares Ranger Hive plugin with Hive to help Spark talking to Ranger Admin.
Include this package in your Spark Applications using:
spark-shell, pyspark, or spark-submit
> $SPARK_HOME/bin/spark-shell --packages yaooqinn:spark-authorizer:1.0.0.spark2.1
If you use the sbt-spark-package plugin, in your sbt build file, add:
spDependencies += "yaooqinn/spark-authorizer:1.0.0.spark2.1"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven" libraryDependencies += "yaooqinn" % "spark-authorizer" % "1.0.0.spark2.1"
MavenIn your pom.xml, add:
<dependencies> <!-- list of dependencies --> <dependency> <groupId>yaooqinn</groupId> <artifactId>spark-authorizer</artifactId> <version>1.0.0.spark2.1</version> </dependency> </dependencies> <repositories> <!-- list of other repositories --> <repository> <id>SparkPackagesRepo</id> <url>http://dl.bintray.com/spark-packages/maven</url> </repository> </repositories>