SparkSession return nothing with an HiveServer2 connection throught JDBC

问题

I have an issue about reading data from a remote HiveServer2 using JDBC and SparkSession in Apache Zeppelin.

Here is the code.

%spark

import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession

val prop = new java.util.Properties
prop.setProperty("user","hive")
prop.setProperty("password","hive")
prop.setProperty("driver", "org.apache.hive.jdbc.HiveDriver")

val test = spark.read.jdbc("jdbc:hive2://xxx.xxx.xxx.xxx:10000/", "tests.hello_world", prop)

test.select("*").show()

When i run this, I've got no errors but no data too, i just retrieve all the column name of table, like this :

+--------------+
|hello_world.hw|
+--------------+
+--------------+

Instead of this :

+--------------+
|hello_world.hw|
+--------------+
+ data_here    +
+--------------+

I'am running all of this on : Scala 2.11.8, OpenJDK 8, Zeppelin 0.7.0, Spark 2.1.0 ( bde/spark ), Hive 2.1.1 ( bde/hive )

I run this setup in Docker which each of those have their own container but connected in the same network.

Furthermore it just works when i use use the spark beeline to connect to my remote Hive.

Did i have forgot something ? Any help would be appreciated. Thanks in advance.

EDIT :

I've found a workaround, which is sharing docker volume or docker data-container between Spark and Hive, more precisily the Hive warehouse folder between them, and with configuring the spark-defaults.conf. Then you can acces hive through SparkSession without JDBC. Here is the step by step to how to do it :

Share the Hive warehouse folder between Spark and Hive

Configure spark-defaults.conf with like this :

spark.serializer     org.apache.spark.serializer.KryoSerializer

spark.driver.memory              Xg

spark.driver.cores       X

spark.executor.memory        Xg

spark.executor.cores         X

spark.sql.warehouse.dir         file:///your/path/here

Replace 'X' with your values.

Hope it helps.

来源：https://stackoverflow.com/questions/41722376/sparksession-return-nothing-with-an-hiveserver2-connection-throught-jdbc

标签

scala

apache-spark

jdbc

Hive

apache-zeppelin