Spark Submit Issue

断了今生、忘了曾经 提交于 2019-12-07 15:06:47

问题


I am trying to run a fat jar on a Spark cluster using Spark submit. I made the cluster using "spark-ec2" executable in Spark bundle on AWS.

The command I am using to run the jar file is

bin/spark-submit --class edu.gatech.cse8803.main.Main --master yarn-cluster ../src1/big-data-hw2-assembly-1.0.jar

In the beginning it was giving me the error that at least one of the HADOOP_CONF_DIR or YARN_CONF_DIR environment variable must be set. I didn't know what to set them to, so I used the following command

export HADOOP_CONF_DIR=/mapreduce/conf

Now the error has changed to

Could not load YARN classes. This copy of Spark may not have been compiled with YARN support.
Run with --help for usage help or --verbose for debug output

The home directory structure is as follows

ephemeral-hdfs  hadoop-native  mapreduce  persistent-hdfs  scala  spark  spark-ec2  src1  tachyon

I even set the YARN_CONF_DIR variable to the same value as HADOOP_CONF_DIR, but the error message is not changing. I am unable to find any documentation that highlights this issue, most of them just mention these two variables and give no further details.


回答1:


You need to compile spark against Yarn to use it.

Follow the steps explained here: https://spark.apache.org/docs/latest/building-spark.html

Maven:

build/mvn -Pyarn -Phadoop-2.x -Dhadoop.version=2.x.x -DskipTests clean package

SBT:

build/sbt -Pyarn -Phadoop-2.x assembly

You can also download a pre-compiled version here: http://spark.apache.org/downloads.html (choose a "pre-built for Hadoop")




回答2:


Download prebuilt spark which supports hadoop 2.X versions from https://spark.apache.org/downloads.html




回答3:


The --master argument should be: --master spark://hostname:7077 where hostname is the name of your Spark master server. You can also specify this value as spark.master in the spark-defaults.conf file and leave out the --master argument when using Spark submit from the command line. Including the --master argument will override the value set (if exists) in the spark-defaults.conf file.

Reference: http://spark.apache.org/docs/1.3.0/configuration.html



来源:https://stackoverflow.com/questions/29585307/spark-submit-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!