I\'m running a Spark job with in a speculation mode. I have around 500 tasks and around 500 files of 1 GB gz compressed. I keep getting in each job, for 1-2 tasks, the attac
in the Spark Web UI, if there is some info like Executors lost
, then you have to
check the yarn log, make sure whether your container has been killed.
If the container was killed, it is probably due to the lack of memory.
How to find the key info in yarn logs? For example, there might be some warnings like this:
Container killed by YARN for exceeding memory limits. 2.5 GB of 2.5 GB physical memory used.
Consider boosting spark.yarn.executor.memoryOverhead.
In this case, it suggests you should increase spark.yarn.executor.memoryOverhead
.