问题
I am writing a Hadoop scheduler. My scheduling requires finding the CPU time taken by each Map/Reduce task.
I know that:
The TaskInProgress class maintains the execStartTime and execFinishTime values which are wall-clock times when the process started and finished, but they do not accurately indicate the CPU time consumed by the task.
Each task is executed in a new JVM, and I could use the OperatingSystemMXBean.getProcessCpuTime () method, but again the description of the method tells me: "Returns the CPU time used by the process on which the Java virtual machine is running in nanoseconds". I am not entirely clear if this is what I want.
回答1:
I am using a library that records resource metrics like CPU Usage/IDLE time, swap usage and memory usage.
http://code.google.com/p/hadoop-toolkit/
You have to extract a patch and apply it to a 20.2 tag version.
I am not entirely clear if this is what I want.
I am pretty sure that this method returns the wall clock time as well.
回答2:
Just for posterity, I solved this problem by making a change in src/mapred/org/apache/hadoop/mapred/TaskLog.java (Hadoop 0.20.203) on line 572
mergedCmd.append("exec setsid 'time' "); // add 'time'
The CPU time will be written to: logs/userlogs/JOBID/TASKID/stderr. I also wrote a script to reap the cumulative CPU time: https://gist.github.com/1984365 Before running the job, you need to make sure you do:
rm -rf logs/userlogs/*
so that the script works.
来源:https://stackoverflow.com/questions/9365812/how-to-find-the-cpu-time-taken-by-a-map-reduce-task-in-hadoop