Apache Spark: “failed to launch org.apache.spark.deploy.worker.Worker” or Master

我与影子孤独终老i 提交于 2019-12-02 21:02:55

The Spark configuration system is a mess of environment variables, argument flags, and Java Properties files. I just spent a couple hours tracking down the same warning, and unraveling the Spark initialization procedure, and here's what I found:

  1. sbin/start-all.sh calls sbin/start-master.sh (and then sbin/start-slaves.sh)
  2. sbin/start-master.sh calls sbin/spark-daemon.sh start org.apache.spark.deploy.master.Master ...
  3. sbin/spark-daemon.sh start ... forks off a call to bin/spark-class org.apache.spark.deploy.master.Master ..., captures the resulting process id (pid), sleeps for 2 seconds, and then checks whether that pid's command's name is "java"
  4. bin/spark-class is a bash script, so it starts out with the command name "bash", and proceeds to:
    1. (re-)load the Spark environment by sourcing bin/load-spark-env.sh
    2. finds the java executable
    3. finds the right Spark jar
    4. calls java ... org.apache.spark.launcher.Main ... to get the full classpath needed for a Spark deployment
    5. then finally hands over control, via exec, to java ... org.apache.spark.deploy.master.Master, at which point the command name becomes "java"

If steps 4.1 through 4.5 take longer than 2 seconds, which in my (and your) experience seems pretty much inevitable on a fresh OS where java has never been previously run, you'll get the "failed to launch" message, despite nothing actually having failed.

The slaves will complain for the same reason, and thrash around until the master is actually available, but they should keep retrying until they successfully connect to the master.

I've got a pretty standard Spark deployment running on EC2; I use:

  • conf/spark-defaults.conf to set spark.executor.memory and add some custom jars via spark.{driver,executor}.extraClassPath
  • conf/spark-env.sh to set SPARK_WORKER_CORES=$(($(nproc) * 2))
  • conf/slaves to list my slaves

Here's how I start a Spark deployment, bypassing some of the {bin,sbin}/*.sh minefield/maze:

# on master, with SPARK_HOME and conf/slaves set appropriately
mapfile -t ARGS < <(java -cp $SPARK_HOME/lib/spark-assembly-1.6.1-hadoop2.6.0.jar org.apache.spark.launcher.Main org.apache.spark.deploy.master.Master | tr '\0' '\n')
# $ARGS now contains the full call to start the master, which I daemonize with nohup
SPARK_PUBLIC_DNS=0.0.0.0 nohup "${ARGS[@]}" >> $SPARK_HOME/master.log 2>&1 < /dev/null &

I'm still using sbin/start-daemon.sh to start the slaves, since that's easier than calling nohup within the ssh command:

MASTER=spark://$(hostname -i):7077
while read -r; do
  ssh -o StrictHostKeyChecking=no $REPLY "$SPARK_HOME/sbin/spark-daemon.sh start org.apache.spark.deploy.worker.Worker 1 $MASTER" &
done <$SPARK_HOME/conf/slaves
# this forks the ssh calls, so wait for them to exit before you logout

There! It assumes that I'm using all the default ports and stuff, and that I'm not doing stupid shit like putting whitespace in filenames, but I think it's cleaner this way.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!