SparkSession return nothing with an HiveServer2 connection throught JDBC

两盒软妹~` 提交于 2019-12-21 04:53:10

问题


I have an issue about reading data from a remote HiveServer2 using JDBC and SparkSession in Apache Zeppelin.

Here is the code.

%spark

import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession

val prop = new java.util.Properties
prop.setProperty("user","hive")
prop.setProperty("password","hive")
prop.setProperty("driver", "org.apache.hive.jdbc.HiveDriver")

val test = spark.read.jdbc("jdbc:hive2://xxx.xxx.xxx.xxx:10000/", "tests.hello_world", prop)

test.select("*").show()

When i run this, I've got no errors but no data too, i just retrieve all the column name of table, like this :

+--------------+
|hello_world.hw|
+--------------+
+--------------+

Instead of this :

+--------------+
|hello_world.hw|
+--------------+
+ data_here    +
+--------------+

I'am running all of this on : Scala 2.11.8, OpenJDK 8, Zeppelin 0.7.0, Spark 2.1.0 ( bde/spark ), Hive 2.1.1 ( bde/hive )

I run this setup in Docker which each of those have their own container but connected in the same network.

Furthermore it just works when i use use the spark beeline to connect to my remote Hive.

Did i have forgot something ? Any help would be appreciated. Thanks in advance.

EDIT :

I've found a workaround, which is sharing docker volume or docker data-container between Spark and Hive, more precisily the Hive warehouse folder between them, and with configuring the spark-defaults.conf. Then you can acces hive through SparkSession without JDBC. Here is the step by step to how to do it :

  1. Share the Hive warehouse folder between Spark and Hive
  2. Configure spark-defaults.conf with like this :

    spark.serializer     org.apache.spark.serializer.KryoSerializer
    
    spark.driver.memory              Xg
    
    spark.driver.cores       X
    
    spark.executor.memory        Xg
    
    spark.executor.cores         X
    
    spark.sql.warehouse.dir         file:///your/path/here
    

Replace 'X' with your values.

Hope it helps.

来源:https://stackoverflow.com/questions/41722376/sparksession-return-nothing-with-an-hiveserver2-connection-throught-jdbc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!