Spark job fails because it can't find the hadoop core-site.xml

问题

I'm trying to run a spark job and I'm getting this error when I try to start the driver:

16/05/17 14:21:42 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:/var/lib/mesos/slave/slaves/0c080f97-9ef5-48a6-9e11-cf556dfab9e3-S1/frameworks/5c37bb33-20a8-4c64-8371-416312d810da-0002/executors/driver-20160517142123-0183/runs/802614c4-636c-4873-9379-b0046c44363d/core-site.xml does not exist.
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at com.spark.test.SparkJobRunner.main(SparkJobRunner.java:56)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I have spark running on several servers that are a part of my mesos cluster (not sure that's right but that's what I'm doing) I also have hadoop running on these servers. I started the spark master on one server and then started the spark slaves on the other servers. I have 3 apps, not that it matters, but I have a UI, where the user can kick off spark jobs, it puts the jobs in a kafka queue, I have the a launcher app that creates the spark job using the SparkLauncher (see the code below) and then I have my spark driver which connects to the kafka queue and then processes requests sent in from the UI. The UI and launcher are running in marathon. Spark as stated above is it's own process on the cluster and the driver connects to spark to run the jobs. EDIT: I have uploaded hdfs-site.xml, core-site.xml and spark-env.sh to hadoop and point to them in my spark context:

SparkConf conf = new SparkConf()
                .setAppName(config.getString(SPARK_APP_NAME))
                .setMaster(sparkMaster)
                .setExecutorEnv("HADOOP_USER_NAME", config.getString(HADOOP_USER, ""))
                .set("spark.mesos.uris", "<hadoop node>:9000/config/core-site.xml,<hadoop node>:9000/config/hdfs-site.xml") 
                .set("spark.files", "core-site.xml,hdfs-site.xml,spark-env.sh") 
                .set("spark.mesos.coarse", "true")
                .set("spark.cores.max", config.getString(SPARK_CORES_MAX))
                .set("spark.driver.memory", config.getString(SPARK_DRIVER_MEMORY))
                .set("spark.driver.extraJavaOptions", config.getString(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, ""))
                .set("spark.executor.memory", config.getString(SPARK_EXECUTOR_MEMORY))
                .set("spark.executor.extraJavaOptions", config.getString(SPARK_EXECUTOR_EXTRA_JAVA_OPTIONS))
                .set("spark.executor.uri", hadoopPath);

Here is the code that launches the driver:

SparkLauncher launcher = new SparkLauncher()
            .setMaster(<my spark/mesos master>)
            .setDeployMode("cluster")
            .setSparkHome("/home/spark")
            .setAppResource(<hdfs://path/to/a/spark.jar>)
            .setMainClass(<my main class>);
handle = launcher.startApplication();

I'm sure I'm doing something wrong I just can't figure out what. I'm new to spark, hadoop and mesos, so feel free to point out anything else I'm doing wrong.

回答1:

My problem was that I hadn't set the HADOOP_CONF_DIR in $SPARK_HOME/spark-env.sh on each server in my cluster. Once I set that I was able to get my spark job to start correctly. I also realized I didn't need to include the core-site.xml, hdfs-site.xml or spark-env.sh files in the SparkConf so I removed the line that set "spark.files"

来源：https://stackoverflow.com/questions/37286954/spark-job-fails-because-it-cant-find-the-hadoop-core-site-xml

标签

java

Hadoop

apache-spark

mesos