How to find the CPU time taken by a Map/Reduce task in Hadoop

房东的猫 提交于 2019-12-12 03:18:15

问题


I am writing a Hadoop scheduler. My scheduling requires finding the CPU time taken by each Map/Reduce task.

I know that:

  • The TaskInProgress class maintains the execStartTime and execFinishTime values which are wall-clock times when the process started and finished, but they do not accurately indicate the CPU time consumed by the task.

  • Each task is executed in a new JVM, and I could use the OperatingSystemMXBean.getProcessCpuTime () method, but again the description of the method tells me: "Returns the CPU time used by the process on which the Java virtual machine is running in nanoseconds". I am not entirely clear if this is what I want.


回答1:


I am using a library that records resource metrics like CPU Usage/IDLE time, swap usage and memory usage.

http://code.google.com/p/hadoop-toolkit/

You have to extract a patch and apply it to a 20.2 tag version.

I am not entirely clear if this is what I want.

I am pretty sure that this method returns the wall clock time as well.




回答2:


Just for posterity, I solved this problem by making a change in src/mapred/org/apache/hadoop/mapred/TaskLog.java (Hadoop 0.20.203) on line 572

mergedCmd.append("exec setsid 'time' ");    // add 'time'

The CPU time will be written to: logs/userlogs/JOBID/TASKID/stderr. I also wrote a script to reap the cumulative CPU time: https://gist.github.com/1984365 Before running the job, you need to make sure you do:

rm -rf logs/userlogs/*

so that the script works.



来源:https://stackoverflow.com/questions/9365812/how-to-find-the-cpu-time-taken-by-a-map-reduce-task-in-hadoop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!