What is the difference between FAILED AND ERROR in spark application states

问题

I am trying to create a state diagram of a submitted spark application. I and kind of lost on when then an application is considered FAILED.

States are from here: https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/core/src/main/scala/org/apache/spark/deploy/master/DriverState.scala

回答1:

This stage is very important, since when it comes to Big Data, Spark is awesome, but let's face it, we haven't solve the problem yet!

When a task/job fails, Spark restarts it (recall that the RDD, the main abstraction Spark provides, is a Resilient Distributed Dataset, which is not what we are looking for here, but it would give the intuition).

I use Spark 1.6.2 and my cluster restarts the job/task 3 times, when it is marked as FAILED.

For example, one of my recent jobs had to restart a whole stage:

In the cluster/app, one can see the attempt IDs, here the application is in its 3rd and final attempt:

If that attempt is marked as FAILED (for whatever reason, e.g. out-of-memory, bad DNS, GC allocation memory, disk failed, node didn't respond to the 4 heartbeats (probably is down), etc.), then Spark relaunches the job.

来源：https://stackoverflow.com/questions/39172115/what-is-the-difference-between-failed-and-error-in-spark-application-states

标签

apache-spark

driver

scheduling

distributed-computing

bigdata

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!