How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

前端 未结 2 1647
慢半拍i
慢半拍i 2020-12-13 20:50

I have the following code which fires hiveContext.sql() most of the time. My task is I want to create few tables and insert values into after processing for all

2条回答
  •  [愿得一人]
    2020-12-13 21:19

    This is my assumption: you must be having limited executors on your cluster and job might be running in shared environment.

    As you said, your file size is small, you can set a smaller number of executors and increase executor cores and setting the memoryOverhead property is important here.

    1. Set number of executors = 5
    2. Set number of execuotr cores = 4
    3. Set memory overhead = 2G
    4. shuffle partition = 20 (to use maximum parallelism based on executors and cores)

    Using above property I am sure you will avoid any executor out of memory issues without compromising performance.

提交回复
热议问题