My knowledge with Spark is limited and you would sense it after reading this question. I have just one node and spark, hadoop and yarn are installed on it.
I was abl
If you look at spark.yarn.jars documentation it says the following
List of libraries containing Spark code to distribute to YARN containers. By default, Spark on YARN will use Spark jars installed locally, but the Spark jars can also be in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't need to be distributed each time an application runs. To point to jars on HDFS, for example, set this configuration to hdfs:///some/path. Globs are allowed.
This means that you are actually overriding the SPARK_HOME/jars and telling yarn to pick up all the jars required for the application run from your path,If you set spark.yarn.jars property, all the dependent jars for spark to run should be present in this path, If you go and look inside spark-assembly.jar present in SPARK_HOME/lib , org.apache.spark.deploy.yarn.ApplicationMaster
class is present, so make sure that all the spark dependencies are present in the HDFS path that you specify as spark.yarn.jars.
I was finally able to make sense of this property. I found by hit-n-trial that correct syntax of this property is
spark.yarn.jars=hdfs://xx:9000/user/spark/share/lib/*.jar
I didn't put *.jar
in the end and my path was just ended with /lib. I tried putting actual assembly jar like this - spark.yarn.jars=hdfs://sanjeevd.brickred:9000/user/spark/share/lib/spark-yarn_2.11-2.0.1.jar
but no luck. All it said that unable to load ApplicationMaster.
I posted my response to a similar question asked by someone at https://stackoverflow.com/a/41179608/2332121
You could also use the spark.yarn.archive
option and set that to the location of an archive (you create) containing all the JARs in the $SPARK_HOME/jars/
folder, at the root level of the archive. For example:
jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .
hdfs dfs -put spark-libs.jar /some/path/
.hdfs dfs –setrep -w 10 hdfs:///some/path/spark-libs.jar
(Change the amount of replicas proportional to the number of total NodeManagers) spark.yarn.archive
to hdfs:///some/path/spark-libs.jar