Why do Spark jobs fail with org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 in speculation mode?

前端未结

关注

 8  1308

I\'m running a Spark job with in a speculation mode. I have around 500 tasks and around 500 files of 1 GB gz compressed. I keep getting in each job, for 1-2 tasks, the attac

相关标签:

8条回答

失恋的感觉

2020-12-07 10:17
in the Spark Web UI, if there is some info like Executors lost, then you have to check the yarn log, make sure whether your container has been killed.

If the container was killed, it is probably due to the lack of memory.

How to find the key info in yarn logs? For example, there might be some warnings like this:
```
Container killed by YARN for exceeding memory limits. 2.5 GB of 2.5 GB physical memory used. 
Consider boosting spark.yarn.executor.memoryOverhead.
```
In this case, it suggests you should increase spark.yarn.executor.memoryOverhead.
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2020-12-07 10:19
I solved this error increasing the allocated memory in executorMemory and driverMemory. You can do this in HUE selecting the Spark Program which is causing the problem and in properties -> Option list you can add something like this:
```
--driver-memory 10G --executor-memory 10G --num-executors 50 --executor-cores 2
```
Of course the values of the parameters will vary depending on you cluster's size and your needs.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2