How to update task tracker that my mapper is still running fine as opposed to generating timeout?

半世苍凉 提交于 2020-01-02 17:25:32

问题


I forgot what API/method to call, but my problem is that :

My mapper will run more than 10 minutes - and I don't want to increase default timeout.

Rather I want to have my mapper send out update ping to task tracker, when it is in the particular code path that consumes time > 10 mins.

Please let me know what API/method to call.


回答1:


You can simply increase a counter and call progress. This will ensure that the task sends a heartbeat back to the tasktracker to know if its alive.

In the new API this is managed through the context, see here: http://hadoop.apache.org/common/docs/r1.0.0/api/index.html

e.G.

@Override
protected void map(LongWritable key, Text value, Context context)
  throws IOException, InterruptedException {
    // increment counter
    context.getCounter(SOME_ENUM).increment(1);
    context.progress();
}

In the old API there is the reporter class: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html




回答2:


You typically use the Reporter to let the framework know you're still alive.

Quote from the javadoc:

Mapper and Reducer can use the Reporter provided to report progress or just indicate that they are alive. In scenarios where the application takes an insignificant amount of time to process individual key/value pairs, this is crucial since the framework might assume that the task has timed-out and kill that task.



来源:https://stackoverflow.com/questions/11814469/how-to-update-task-tracker-that-my-mapper-is-still-running-fine-as-opposed-to-ge

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!