Spark on YARN resource manager: Relation between YARN Containers and Spark Executors

后端 未结 1 2033
情话喂你
情话喂你 2020-12-14 20:06

I\'m new to Spark on YARN and don\'t understand the relation between the YARN Containers and the Spark Executors. I tried out the following configu

相关标签:
1条回答
  • 2020-12-14 20:45

    I will report my insights here step by step:

    • First important thing is this fact (Source: this Cloudera documentation):

      When running Spark on YARN, each Spark executor runs as a YARN container. [...]

    • This means the number of containers will always be the same as the executors created by a Spark application e.g. via --num-executors parameter in spark-submit.

    • Set by the yarn.scheduler.minimum-allocation-mb every container always allocates at least this amount of memory. This means if parameter --executor-memory is set to e.g. only 1g but yarn.scheduler.minimum-allocation-mb is e.g. 6g, the container is much bigger than needed by the Spark application.

    • The other way round, if the parameter --executor-memory is set to somthing higher than the yarn.scheduler.minimum-allocation-mb value, e.g. 12g, the Container will allocate more memory dynamically, but only if the requested amount of memory is smaller or equal to yarn.scheduler.maximum-allocation-mb value.

    • The value of yarn.nodemanager.resource.memory-mb determines, how much memory can be allocated in sum by all containers of one host!

    => So setting yarn.scheduler.minimum-allocation-mb allows you to run smaller containers e.g. for smaller executors (else it would be waste of memory).

    => Setting yarn.scheduler.maximum-allocation-mb to the maximum value (e.g. equal to yarn.nodemanager.resource.memory-mb) allows you to define bigger executors (more memory is allocated if needed, e.g. by --executor-memory parameter).

    0 讨论(0)
提交回复
热议问题