Hadoop YARN job is getting stucked at map 0% and reduce 0%

问题

I am trying to run a very simple job to test my hadoop setup so I tried with Word Count Example , which get stuck in 0% , so i tried some other simple jobs and each one of them stuck

52191_0003/
14/07/14 23:55:51 INFO mapreduce.Job: Running job: job_1405376352191_0003
14/07/14 23:55:57 INFO mapreduce.Job: Job job_1405376352191_0003 running in uber mode : false
14/07/14 23:55:57 INFO mapreduce.Job:  map 0% reduce 0%

I am using hadoop version- Hadoop 2.3.0-cdh5.0.2

I did quick research on Google and found to increase

yarn.scheduler.minimum-allocation-mb
yarn.nodemanager.resource.memory-mb

I am having single node cluster, running in my Macbook with dual core and 8 GB Ram.

my yarn-site.xml file -

<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>resourcemanager.company.com</value>
  </property>   
  <property>
    <description>Classpath for typical applications.</description>
    <name>yarn.application.classpath</name>
    <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
    </value>
  </property>

  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local</value>
  </property>
  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn/logs</value>
  </property>
  <property>
  </property>
    <name>yarn.log.aggregation.enable</name>
    <value>true</value> 
  <property>
    <description>Where to aggregate logs</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>hdfs://var/log/hadoop-yarn/apps</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    <description>shuffle service that needs to be set for Map Reduce to run </description>
  </property>
   <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  </property>

  <property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>8092</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx768m</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description>Execution framework.</description>
    </property>
    <property>
        <name>mapreduce.map.cpu.vcores</name>
        <value>4</value>
        <description>The number of virtual cores required for each map task.</description>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>8092</value>
        <description>Larger resource limit for maps.</description>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx768m</value>
        <description>Heap-size for child jvms of maps.</description>
    </property>
    <property>
        <name>mapreduce.jobtracker.address</name>
        <value>jobtracker.alexjf.net:8021</value>
    </property>

 <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>2048</value>
    <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>8092</value>
    <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>2</value>
    <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>10</value>
    <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
  </property>
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
    <description>Physical memory, in MB, to be made available to running containers</description>
  </property>
  <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>4</value>
    <description>Number of CPU cores that can be allocated for containers.</description>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    <description>shuffle service that needs to be set for Map Reduce to run </description>
  </property>
   <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

</configuration>

my mapred-site.xml

  <property>    
    <name>mapreduce.framework.name</name>    
    <value>yarn</value>  
  </property>

has only 1 property. tried several permutation and combinations but couldn't get rid of the error.

Log of the job

 23:55:55,694 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2014-07-14 23:55:55,697 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2014-07-14 23:55:55,699 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
2014-07-14 23:55:55,769 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: maxContainerCapability: 8092
2014-07-14 23:55:55,769 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: root.abhishekchoudhary
2014-07-14 23:55:55,775 INFO [main] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit on the thread pool size is 500
2014-07-14 23:55:55,777 INFO [main] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
2014-07-14 23:55:55,787 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1405376352191_0003Job Transitioned from INITED to SETUP
2014-07-14 23:55:55,789 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP
2014-07-14 23:55:55,800 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1405376352191_0003Job Transitioned from SETUP to RUNNING
2014-07-14 23:55:55,823 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1405376352191_0003_m_000000 Task Transitioned from NEW to SCHEDULED
2014-07-14 23:55:55,824 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1405376352191_0003_m_000001 Task Transitioned from NEW to SCHEDULED
2014-07-14 23:55:55,824 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1405376352191_0003_m_000002 Task Transitioned from NEW to SCHEDULED
2014-07-14 23:55:55,825 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1405376352191_0003_m_000003 Task Transitioned from NEW to SCHEDULED
2014-07-14 23:55:55,826 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1405376352191_0003_m_000000_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-14 23:55:55,827 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1405376352191_0003_m_000001_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-14 23:55:55,827 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1405376352191_0003_m_000002_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-14 23:55:55,827 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1405376352191_0003_m_000003_0 TaskAttempt Transitioned from NEW to UNASSIGNED
2014-07-14 23:55:55,828 INFO [Thread-49] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceReqt:8092
2014-07-14 23:55:55,858 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer setup for JobId: job_1405376352191_0003, File: hdfs://localhost/tmp/hadoop-yarn/staging/abhishekchoudhary/.staging/job_1405376352191_0003/job_1405376352191_0003_1.jhist
2014-07-14 23:55:56,773 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2014-07-14 23:55:56,799 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1405376352191_0003: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=1

回答1:

Based on the messsage Connecting to ResourceManager at /0.0.0.0:8030, are you sure your ResourceManager is supposed to be at 0.0.0.0:8030 (the default)? If not you should add the following to your yarn-site.xml:

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>MASTER ADDRESS</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>${yarn.resourcemanager.hostname}:8025</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address</name>
  <value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
  <name>yarn.resourcemanager.address</name>
  <value>${yarn.resourcemanager.hostname}:8040</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address</name>
  <value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address</name>
  <value>${yarn.resourcemanager.hostname}:8033</value>
</property>

Replace MASTER ADDRESS with the address of the master node. You can individually change the address of the resource manager's webapp, admin, etc.

回答2:

Your settings appear to be incorrect.

The setting yarn.nodemanager.resource.memory-mb is set to 2GB. This is the "amount of physical memory, in MB, that can be allocated for containers." But your mapreduce.map.memory.mb is 8GB. 8GB is what you're really requesting.

Additionally, you have set yarn.app.mapreduce.am.resource.mb to 8GB. As such, you're trying to allocate an AM which controls the job at 8GB plus several mappers at 8GB.

Solution

To solve the issue, you can drop the AM size to 1GB and then the mapper size to .5GB, which is a more reasonable size for playing around especially for word count.

Additional resources

You can refer to this instruction provided by Clouera to understand these properties in more detail.

回答3:

I don't know if you simply made a copy/paste error when creating this question but looking at your yarn-site.xml it starts with two <property> tags. I'm not sure if Hadoop's xml parser will actually apply those nested <property> tags.

回答4:

I am using Apache Hadoop version 2.7.2 so it might be like "apples-to-oranges" comparison, however I ran into the same silent stuck state the other day. In most of the cases this "silence" for an extended period of time indicates that the scheduler is not able to allocate enough resources to the application.

In my specific case with a similar configuration, increasing the value for property yarn.nodemanager.resource.memory-mb in yarn-site.xml did the trick.

You can also check other properties for resource allocation here

来源：https://stackoverflow.com/questions/24747427/hadoop-yarn-job-is-getting-stucked-at-map-0-and-reduce-0

标签

Hadoop

MapReduce

Cloudera

yarn