Hi we have recently upgraded to yarn from mr1. I know that container is an abstract notion but I don\'t understand how many jvm task (map, reduce, filter etc) one container
For e.g. I have a Mapreduce application, which spawns 10 mappers:
I am running this on a single host with 8 vCores (this value is determined by the configuration parameter: yarn.nodemanager.resource.cpu-vcores). By default, this is set to 8. Please check "YarnConfiguration.java"
/** Number of Virtual CPU Cores which can be allocated for containers.*/
public static final String NM_VCORES = NM_PREFIX + "resource.cpu-vcores";
public static final int DEFAULT_NM_VCORES = 8;
Since there are 10 mappers and 1 Application master, total number of containers spawned is 11.
So, for each map/reduce task a different container gets launched.
But, in Yarn, for MapReduce jobs, there is a concept of a Uber job, which enables the user to use a single container for multiple mappers and 1 reducer (https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml: CURRENTLY THE CODE CANNOT SUPPORT MORE THAN ONE REDUCE and will ignore larger values.).
There is no configuration parameter available to specify the minimum number of the containers. It is the responsibility of the Application Master to request the number of containers needed.
yarn.scheduler.minimum-allocation-mb - Determines the minimum allocation of memory for each container (yarn.scheduler.maximum-allocation-mb determines the maximum allocation for every container request)
yarn.scheduler.minimum-allocation-vcores - Determines the minumum allocation of vCores for each container (yarn.scheduler.maximum-allocation-vcores determines the maximum allocation for every container request)
In your case, you are requesting "mapreduce.map.memory.mb = 3m (3MB) and mapreduce.map.cpu.vcores = 4 (4 vCores).
So, you will get 1 container with 4 vCores for each mapper (assuming yarn.scheduler.maximum-allocation-vcores is >= 4)
The parameters "mapreduce.map.memory.mb" and "mapreduce.map.cpu.vcores" are set in the mapred-site.xml file. If this configuration parameter is not "final", then it can be overridden in the client, before submitting the job.
Yes. From the "Application Attempt" page for the application, you can see the number of allocated containers. Check the attached figure above.