问题
I am trying to submit a spark job to the CDH yarn cluster via the following commands
I have tried several combinations and it all does not work... I now have all the poi jars located in both my local /root, as well as HDFS /user/root/lib, hence I have tried the following
spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars /root/poi-3.12.jars, /root/poi-ooxml-3.12.jar, /root/poi-ooxml-schemas-3.12.jar
spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars file:/root/poi-3.12.jars, file:/root/poi-ooxml-3.12.jar, file:/root/poi-ooxml-schemas-3.12.jar
spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars hdfs://mynamenodeIP:8020/user/root/poi-3.12.jars,hdfs://mynamenodeIP:8020/user/root/poi-ooxml-3.12.jar,hdfs://mynamenodeIP:8020/user/root/poi-ooxml-schemas-3.12.jar
How do I propogate the jars to all cluster nodes? because none of the above is working, and the job still somehow does not get to reference the class, as I keep getting the same error:
java.lang.NoClassDefFoundError: org/apache/poi/ss/usermodel/WorkbookFactory
The same command works with "--master local", without specifying the --jars, as I have copied my jars to /opt/cloudera/parcels/CDH/lib/spark/lib.
However for yarn-cluster mode, I would need to distribute the external jars to all cluster, but the above code does not work.
Appreciate your help, thanks.
p.s. I am using CDH5.4.2 with spark 1.3.0
回答1:
According to help options from Spark Submit
--jars includes the local jars to include on the driver and executor classpaths. [it will just set the path]
---files will copy the jars needed for you appication to run to all the working dir of executor nodes [it will transport your jar to
working dir]
Note: This is similar to -file options in hadoop streaming , which transports the mapper/reducer scripts to slave nodes.
So try with --files options as well.
$ spark-submit --help
Options:
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor.
hope this helps
回答2:
Have you tried the solution posted in this thread: Spark on yarn jar upload problems
The problem was solved by copying spark-assembly.jar into a directory on the hdfs for each node and then passing it to spark-submit --conf spark.yarn.jar as a parameter. Commands are listed below:
hdfs dfs -copyFromLocal /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar /user/spark/spark-assembly.jar
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly.jar simplemr.jar
来源:https://stackoverflow.com/questions/31602128/spark-submit-yarn-cluster-with-jars-does-not-work