问题
I used synchronous AMRMClient in application master, using addContainerRequest method of AMRMClient to add container requests, using getMatchingRequests and removeContainerRequest methods of AMRMClient to remove container requests. However, when program add container requests with different resources, Resource Manager no longer allocated any resource to application master and it lead to deadlock. Have somebody once faced such problem?
回答1:
Container request at the same priority should have the same resource requirement for now. A workaround for the problem is to use different resource requirements with different priorities. JIRA issue YARN-314 has more discussions about this.
The following detailed code analysis applies to Hadoop 2.7.3.
The key to the problem is in AppSchedulingInfo which organizes all requests according to priority and resource name:
final Map<Priority, Map<String, ResourceRequest>> requests
So for any given priority and resource name, you can have only one ResourceRequest which holds the priority, cpu, memory requirements and number of containers.
In AMRMClientImpl container requests are kept in the following data structure:
class ResourceRequestInfo {
ResourceRequest remoteRequest;
...
}
Map<Priority, Map<String, TreeMap<Resource, ResourceRequestInfo>>> remoteRequestsTable;
So for the same priority and resource name with different resource requirements, you end up with different ResourceRequest
When AM calls AMRMClientImpl.addContainerRequest for the same resource requirements, it will increase number of containers in previously created instance of ResourceRequest. When adding different resource requirements for the same priority and resource name, you end up with different instances of ResourceRequest, but only the last one will be kept in AppSchedulingInfo for scheduling. That's why container request would get lost and never allocated.
来源:https://stackoverflow.com/questions/35116215/yarn-resource-manage-didnt-allocate-containers-when-asking-for-containers-with