问题
I am trying to create a state diagram of a submitted spark application. I and kind of lost on when then an application is considered FAILED.
States are from here: https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/core/src/main/scala/org/apache/spark/deploy/master/DriverState.scala
回答1:
This stage is very important, since when it comes to Big Data, Spark is awesome, but let's face it, we haven't solve the problem yet!
When a task/job fails, Spark restarts it (recall that the RDD, the main abstraction Spark provides, is a Resilient Distributed Dataset, which is not what we are looking for here, but it would give the intuition).
I use Spark 1.6.2 and my cluster restarts the job/task 3 times, when it is marked as FAILED.
For example, one of my recent jobs had to restart a whole stage:
In the cluster/app, one can see the attempt IDs, here the application is in its 3rd and final attempt:
If that attempt is marked as FAILED (for whatever reason, e.g. out-of-memory, bad DNS, GC allocation memory, disk failed, node didn't respond to the 4 heartbeats (probably is down), etc.), then Spark relaunches the job.
来源:https://stackoverflow.com/questions/39172115/what-is-the-difference-between-failed-and-error-in-spark-application-states