The reduce fails due to Task attempt failed to report status for 600 seconds. Killing! Solution?

被刻印的时光 ゝ 提交于 2019-12-03 04:28:17

The reason for the timeouts might be a long-running computation in your reducer without reporting the progress back to the Hadoop framework. This can be resolved using different approaches:

I. Increasing the timeout in mapred-site.xml:

<property>
  <name>mapred.task.timeout</name>
  <value>1200000</value>
</property>

The default is 600000 ms = 600 seconds.

II. Reporting progress every x records as in the Reducer example in javadoc:

public void reduce(K key, Iterator<V> values,
                          OutputCollector<K, V> output, 
                          Reporter reporter) throws IOException {
   // report progress
   if ((noValues%10) == 0) {
     reporter.progress();
   }

   // ...
}

optionally you can increment a custom counter as in the example:

reporter.incrCounter(NUM_RECORDS, 1);

It's possible that you might have consumed all of Java's heap space or GC is happening too frequently giving no chance to the reducer to report status to master and is hence killed.

Another possibility is that one of the reducer is getting too skewed data, i.e. for a particular rid, a lot of records are there.

Try to increase your java heap by setting the following config: mapred.child.java.opts

to

-Xmx2048m

Also, try and reduce the number of parallel reducers by setting the following config to a lower value than what it currently has (default value is 2):

mapred.tasktracker.reduce.tasks.maximum

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!