Hive tables not found when running in YARN-Cluster mode

白昼怎懂夜的黑 提交于 2019-11-28 12:41:25

I posted this same question on the Hortonworks community, and I resolved the issue with the help of this answer.

The gist of it is this: when submitting the application, the --files argument has to come before the --jars argument, and the copy of hive-site.xml to use is the one in the Spark conf dir, not in $HIVE_HOME/conf/hive-site.xml. Hence:

  ./bin/spark-submit \
  --class com.myCompany.Main \
  --master yarn-cluster \
  --num-executors 3 \
  --driver-memory 1g \
  --executor-memory 11g \
  --executor-cores 1 \
  --files /usr/hdp/current/spark-client/conf/hive-site.xml \
  --jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-rdbms-3.2.9.jar,lib/datanucleus-core-3.2.10.jar \
  /home/spark/apps/YarnClusterTest.jar

If you are able to fetch data using Hive CLI, then use the same hive-site.xml in your Spark job.

The only reason could be the location of metastore defined in hive-site.xml.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!