How are containers created based on vcores and memory in MapReduce2?

后端 未结 1 1844
情深已故
情深已故 2020-12-24 09:28

I have a tiny cluster composed of 1 master (namenode, secondarynamenode, resourcemanager) and 2 slaves (datanode, nodemanager).

I have set in the yarn-site.xml of th

相关标签:
1条回答
  • 2020-12-24 10:00

    I will answer this question, on the assumption that the scheduler used is, CapacityScheduler.

    CapacityScheduler uses ResourceCalculator for calculating the resources needed for an application. There are 2 types of resource calculators:

    1. DefaultResourceCalculator: Takes into account, only memory for doing the resource calculations (i.e. for calculating number of containers)
    2. DominantResourceCalculator: Takes into account, both memory and CPU for resource calculations

    By default, the CapacityScheduler uses DefaultResourceCalculator. If you want to use the DominantResourceCalculator, then you need to set following property in "capacity-scheduler.xml" file:

      <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>
        <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
      </property>
    

    Now, to answer your questions:

    1. If DominantResourceCalculator is used, then both memory and VCores are taken into account for calculating the number of containers

    2. mapreduce.map.memory.mb is not an abstract value. It is taken into consideration while calculating the resources.

      The DominantResourceCalculator class has a normalize() function, which normalizes the resource request, using minimumResouce (determined by config yarn.scheduler.minimum-allocation-mb), maximumresource (determined by config yarn.scheduler.maximum-allocation-mb) and a step factor (determined by config yarn.scheduler.minimum-allocation-mb).

      The code for normalizing memory looks like below (Check org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.java):

      int normalizedMemory = Math.min(roundUp(
      Math.max(r.getMemory(), minimumResource.getMemory()),
      stepFactor.getMemory()),maximumResource.getMemory());
      

    Where:

    r = Requested memory

    The logic works like below:

    a. Take max of(requested resource and minimum resource) = max(768, 512) = 768

    b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately)

    Roundup does : ((768 + (512 -1)) / 512) * 512 
    

    c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024) = 1024

    So finally, the allotted memory is 1024 MB, which is what you are getting.

    For the sake of simplicity, you can say that roundup, increments the demand in the steps of 512 MB (which is a minimumresource)

    1. Since Mapper is a java process, mapreduce.map.java.opts is used for specifying the heap size for the mapper.

    Where as mapreduce.map.memory.mb is total memory used by the container.

    Value of mapreduce.map.java.opts should be lesser than mapreduce.map.memory.mb

    The answer here explains that: What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?

    1. When you use DominantResourceCalculator, it uses normalize() function to calculate vCores needed.

      The code for that is (similar to normalization of memory):

        int normalizedCores = Math.min(roundUp  
      `   Math.max(r.getVirtualCores(), minimumResource.getVirtualCores()), 
          stepFactor.getVirtualCores()), maximumResource.getVirtualCores());
      
    0 讨论(0)
提交回复
热议问题