问题
I have an issue about reading data from a remote HiveServer2 using JDBC and SparkSession in Apache Zeppelin.
Here is the code.
%spark
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
val prop = new java.util.Properties
prop.setProperty("user","hive")
prop.setProperty("password","hive")
prop.setProperty("driver", "org.apache.hive.jdbc.HiveDriver")
val test = spark.read.jdbc("jdbc:hive2://xxx.xxx.xxx.xxx:10000/", "tests.hello_world", prop)
test.select("*").show()
When i run this, I've got no errors but no data too, i just retrieve all the column name of table, like this :
+--------------+
|hello_world.hw|
+--------------+
+--------------+
Instead of this :
+--------------+
|hello_world.hw|
+--------------+
+ data_here +
+--------------+
I'am running all of this on : Scala 2.11.8, OpenJDK 8, Zeppelin 0.7.0, Spark 2.1.0 ( bde/spark ), Hive 2.1.1 ( bde/hive )
I run this setup in Docker which each of those have their own container but connected in the same network.
Furthermore it just works when i use use the spark beeline to connect to my remote Hive.
Did i have forgot something ? Any help would be appreciated. Thanks in advance.
EDIT :
I've found a workaround, which is sharing docker volume or docker data-container between Spark and Hive, more precisily the Hive warehouse folder between them, and with configuring the spark-defaults.conf. Then you can acces hive through SparkSession without JDBC. Here is the step by step to how to do it :
- Share the Hive warehouse folder between Spark and Hive
Configure spark-defaults.conf with like this :
spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory Xg spark.driver.cores X spark.executor.memory Xg spark.executor.cores X spark.sql.warehouse.dir file:///your/path/here
Replace 'X' with your values.
Hope it helps.
来源:https://stackoverflow.com/questions/41722376/sparksession-return-nothing-with-an-hiveserver2-connection-throught-jdbc