How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

大城市里の小女人 提交于 2019-11-28 19:34:09
Barak1731475

Generally, you should always dig into logs to get the real exception out (at least in Spark 1.3.1).

tl;dr
safe config for Spark under Yarn
spark.shuffle.memoryFraction=0.5 - this would allow shuffle use more of allocated memory
spark.yarn.executor.memoryOverhead=1024 - this is set in MB. Yarn kills executors when its memory usage is larger then (executor-memory + executor.memoryOverhead)

Little more info

From reading your question you mention that you get shuffle not found exception.

In case of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle you should increase spark.shuffle.memoryFraction, for example to 0.5

Most common reason for Yarn killing off my executors was memory usage beyond what it expected. To avoid that you increase spark.yarn.executor.memoryOverhead , I've set it to 1024, even if my executors use only 2-3G of memory.

NoName

This is my assumption: you must be having limited executors on your cluster and job might be running in shared environment.

As you said, your file size is small, you can set a smaller number of executors and increase executor cores and setting the memoryOverhead property is important here.

  1. Set number of executors = 5
  2. Set number of execuotr cores = 4
  3. Set memory overhead = 2G
  4. shuffle partition = 20 (to use maximum parallelism based on executors and cores)

Using above property I am sure you will avoid any executor out of memory issues without compromising performance.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!