How to get completed job's statistics executed by Hadoop?

社会主义新天地 提交于 2019-12-24 10:57:56

问题


When we run data intensive job over Hadoop. Hadoop executes the job. Now what i want is when the job is completed. it will give me the statistics regarding executed job i.e; time consumed, mapper quantity, reducer quantity and other useful information.

The information displayed in browser like job tracker, data node during the job execution. But how can i get the statistics in my application which runs the job over Hadoop and gives me results like a report at the end of job completion. My application is in JAVA

Any API which can help me. Suggestions will be appreciated.


回答1:


Look into the following methods of JobClient:

  • getMapTaskReports(JobID)
  • getReduceTaskReports(JobID)

Both these calls return arrays of TaskReport object, from which you can pull start / finish times, and individual counters for each task




回答2:


Chirs is correct. The documentation of TaskReport states that org.apache.hadoop.mapred.TaskReport inherits those methods from org.apache.hadoop.mapreduce.TaskReport. So, one could get such values.

Here are the codes to get the start and end time of a job, grouped for each Map and Reduce tasks.

import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobStatus;
import org.apache.hadoop.conf.Configuration;
import java.net.InetSocketAddress;
import java.util.*;
import org.apache.hadoop.mapred.TaskReport;
import org.apache.hadoop.mapred.RunningJob;
import org.apache.hadoop.util.StringUtils;
import java.text.SimpleDateFormat;

public class mini{
        public static void main(String args[]){
                String jobTrackerHost = "192.168.151.14";
                int jobTrackerPort = 54311;

                try{
                        Configuration conf = new Configuration();
                        JobClient jobClient = new JobClient(new InetSocketAddress(jobTrackerHost, jobTrackerPort), conf);
                        JobStatus[] activeJobs = jobClient.jobsToComplete();
                        SimpleDateFormat dateFormat = new SimpleDateFormat("d-MMM-yyyy HH:mm:ss");
                        for(JobStatus js: activeJobs){
                                System.out.println(js.getJobID());
                                RunningJob runningjob = jobClient.getJob(js.getJobID());
                                            while(runningjob.isComplete() == false){ /*Wait till the job completes.*/}
                                TaskReport[] maptaskreports = jobClient.getMapTaskReports(js.getJobID());
                                for(TaskReport tr: maptaskreports){
                                        System.out.println("Task ID: "+tr.getTaskID()+" Start TIme: "+StringUtils.getFormattedTimeWithDiff(dateFormat, tr.getStartTime(), 0)+" Finish Time: "+StringUtils.getFormattedTimeWithDiff(dateFormat, tr.getFinishTime(), tr.getStartTime()));
                                }
                                TaskReport[] reducetaskreports = jobClient.getReduceTaskReports(js.getJobID());
                                for(TaskReport tr: reducetaskreports){
                                        System.out.println("Task ID: "+tr.getTaskID()+" Start TIme: "+StringUtils.getFormattedTimeWithDiff(dateFormat, tr.getStartTime(), 0)+" Finish Time: "+StringUtils.getFormattedTimeWithDiff(dateFormat, tr.getFinishTime(), tr.getStartTime()));
                                }

                        }
                }catch(Exception ex){
                        ex.printStackTrace();
                }
        }
}

This is a simple example to get the Start and Finish time of a running job. You can in the way you want.

And here is the run of this program for a "Word Count" MapReduce job.

[root@dev1-slave1 ~]# java -classpath /usr/lib/hadoop/hadoop-core.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-lang-2.4.jar:. mini
job_201501151144_0042
Task ID: task_201501151144_0042_m_000000 Start TIme: 16-Jan-2015 17:07:35 Finish Time: 16-Jan-2015 17:07:43 (7sec)
Task ID: task_201501151144_0042_m_000001 Start TIme: 16-Jan-2015 17:07:35 Finish Time: 16-Jan-2015 17:07:56 (20sec)
Task ID: task_201501151144_0042_m_000002 Start TIme: 16-Jan-2015 17:07:35 Finish Time: 16-Jan-2015 17:07:43 (7sec)
Task ID: task_201501151144_0042_m_000003 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:07:53 (10sec)
Task ID: task_201501151144_0042_m_000004 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:07:53 (10sec)
Task ID: task_201501151144_0042_r_000000 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:08:00 (17sec)
Task ID: task_201501151144_0042_r_000001 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:08:05 (22sec)
Task ID: task_201501151144_0042_r_000002 Start TIme: 16-Jan-2015 17:07:43 Finish Time: 16-Jan-2015 17:08:05 (21sec)

Its good to open the desired jsp files of hadoop in its mapreduce/src/webapps/job/ directory and figure out how JOBTRACKER Web UI is displaying information.

I have derived above codes from jobtasks.jsp.

Hope it helps. :)



来源:https://stackoverflow.com/questions/16180654/how-to-get-completed-jobs-statistics-executed-by-hadoop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!