Failed to create client - Spark as execution engine with hive

浪尽此生 提交于 2019-12-11 14:54:38

问题


I have a 32GB single node Amazon EMR cluster with hive 2.3.4, spark 2.4.2 installed and Hadoop 2.8.5.

I am trying to configure spark as the execution engine for hive.

I have linked the spark jar files in hive via the following command:

sudo ln -s /usr/lib/spark/jars/spark-core_2.11-2.4.2.jar
sudo ln -s /usr/lib/spark/jars/spark-network-common_2.11-2.4.2.jar
sudo ln -s /usr/lib/spark/jars/scala-library-2.11.12.jar

I have set execution engine in the hive-site.xml file as well. I have added the following to my hive-site.xml present in /etc/hive/conf/ folder :

<property>
  <name>hive.execution.engine</name>
  <value>spark</value>
</property>

<property>
   <name>spark.master</name>
   <value>spark://<EMR hostname>:7077</value>
 </property>
<property>
   <name>spark.eventLog.enabled</name>
   <value>true</value>
 </property>
<property>
   <name>spark.eventLog.dir</name>
   <value>/tmp</value>
 </property>
<property>
   <name>spark.serializer</name>
   <value>org.apache.spark.serializer.KryoSerializer</value>
 </property>
<property>
<property>
  <name>spark.yarn.jars</name>
  <value>hdfs://<EMR hostname>:8020/spark-jars/*</value>
</property>

Also, I have copied all the jars in spark to a hdfs folder named spark-jars

When I am running the hive query, I get the following error:

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j2.properties Async: false
FAILED: SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.

I have also checked the hive hadoop logs and it only gives me the following:

2019-07-02T13:33:23,831 ERROR [f7d8916c-25f1-4d90-8919-07c4b3422b35 main([])]: ql.Driver (SessionState.java:printError(1126)) - FAILED: Semanti$
org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to c$
        at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:240)
        at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:173)
        at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
        at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
        at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
        at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
        at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
        at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runSetReducerParallelism(SparkCompiler.java:288)
        at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:122)
        at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11293)
        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)

I am running the following hql file:

set hive.spark.client.server.connect.timeout=300000ms;
set spark.executor.memory=4915m;
set spark.executor.cores=2;
set spark.yarn.executor.memoryOverhead=1229m;
set spark.executor.instances=2;
set spark.driver.memory=4096m;
set spark.yarn.driver.memoryOverhead=400m;

select column_name from table_name group by column_name;

If you need to see any other configuration file please do tell me...

Is this error due to version incompatibility? Or is it not possible to use spark as an execution engine with hive on Amazon EMR?

来源:https://stackoverflow.com/questions/56853923/failed-to-create-client-spark-as-execution-engine-with-hive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!