Spark Yarn Memory configuration

大憨熊 提交于 2019-12-23 04:09:43

问题


I have a spark application that keeps failing on error:

"Diagnostics: Container [pid=29328,containerID=container_e42_1512395822750_0026_02_000001] is running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB physical memory used; 2.3 GB of 3.1 GB virtual memory used. Killing container."

I saw lots of different parameters that was suggested to change to increase the physical memory. Can I please have the some explanation for the following parameters?

  • mapreduce.map.memory.mb (currently set to 0 so suppose to take the default which is 1GB so why we see it as 1.5 GB, changing it also dint effect the number)

  • mapreduce.reduce.memory.mb (currently set to 0 so suppose to take the default which is 1GB so why we see it as 1.5 GB, changing it also dint effect the number)

  • mapreduce.map.java.opts/mapreduce.reduce.java.opts set to 80% form the previous number

  • yarn.scheduler.minimum-allocation-mb=1GB (when changing this then I see the effect on the max physical memory, but for the value 1 GB it still 1.5G)

  • yarn.app.mapreduce.am.resource.mb/spark.yarn.executor.memoryOverhead can't find at all in configuration.

We are defining YARN (running with yarn-cluster deployment mode) using cloudera CDH 5.12.1.


回答1:


spark.driver.memory
spark.executor.memory

These control the base amount of memory spark will try to allocate for it's driver and for all the executors. These are probably the ones you want to increase if you are running out of memory.

// options before Spark 2.3.0
spark.yarn.driver.memoryOverhead
spark.yarn.executor.memoryOverhead

// options after Spark 2.3.0
spark.driver.memoryOverhead
spark.executor.memoryOverhead

This value is an additional amount of memory to request when you are running Spark on yarn. It is intended to account extra RAM needed for the yarn container that is hosting your Spark Executors.

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb

When Spark goes to ask Yarn to reserve a block of RAM for an executor, it will ask a value of the base memory plus the overhead memory. However, Yarn may not give it back one of exactly that size. These parameters control the smallest container size and the largest container size that YARN will grant. If you are only using the cluster for one job, I find it easiest to set these to very small and very large values and then using the spark memory settings mentions above to set the true container size.

mapreduce.map.memory.mb
mapreduce.map.memory.mb
mapreduce.map.java.opts/mapreduce.reduce.java.opts

I don't think these have any bearing on your Spark/Yarn job.



来源:https://stackoverflow.com/questions/47701102/spark-yarn-memory-configuration

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!