AWS EMR 5.11.0 - Apache Hive on Spark

佐手、 提交于 2019-12-01 08:01:55

Sorry, but Hive on Spark is not yet supported on EMR. I have not tried it myself yet, but I think the likely cause of your errors might be a mismatch between the version of Spark supported on EMR and the version of Spark upon which Hive depends. The last time I checked, Hive did not support Spark 2.x when running Hive on Spark. Given that your first error is a NoSuchFieldError, it seems like a version mismatch is the most likely cause. The timeout error may be a red herring.

EMR Spark supports Hive version 1.2.1 and not the hive 2.x version. Could you please check the hive jar versions available in /usr/lib/spark/jars/ directory. SPARK_RPC_SERVER_ADDRESS is added in hive version 2.x.

The sbt or pom.xml to be like as follows.

"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",

"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",

"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",

I am running DataWarehouse (Hive) on EMR and spark application stored the data into DWH.

I be able run hive on spark by run it like:

HIVE_AUX_JARS_PATH=$(find /usr/lib/spark/jars/ -name '*.jar' -and -not -name '*slf4j-log4j12*' -printf '%p:' | head -c-1) hive

Then, before other SQL queries issue:

SET hive.execution.engine = spark;

To make that persistent

Add line

export HIVE_AUX_JARS_PATH=$(find /usr/lib/spark/jars/ -name '*.jar' -and -not -name '*slf4j-log4j12*' -printf '%p:' | head -c-1)

into /home/hadoop/.bashrc

And in file /etc/hive/conf/hive-site.xml set:

<property>
  <name>hive.execution.engine</name>
  <value>spark</value>
</property>
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!