Measure Hadoop job time using JobControl

六眼飞鱼酱① 提交于 2019-12-14 03:04:15

问题


I used to launch my Hadoop job with the following

long start = new Date().getTime();
boolean status = job.waitForCompletion(true);            
long end = new Date().getTime();

This way I could measure the time taken by the job once it ends directly in my code.

Now I have to use the JobControl in order to express dependencies between my jobs:

JobControl jobControl = new JobControl("MyJob");
jobControl.addJob(job1);
jobControl.addJob(job2);
job3.addDependingJob(job2);
jobControl.addJob(job3);

jobControl.run();

However once jobControl.run() has been executed, the code never goes further so I cannot include code to poll on the jobControl.getState() for the completion of the job.

How can I measure the time taken by a job using JobControl?


回答1:


JobControl has no nice functionality to allow you to hook and get this information. You have some (potentially painful) options to try:

  • Start JobControl.run() in a separate thread, and in your main thread, poll the JobControl.getXXXJobs() methods to track when jobs change state
  • Look into using the Job End Notification URL hook, but this will require you to start a 'server' in your client to receive the notification events, and then try to work backwards from when a job ends
  • Extend the JobControl and jobcontrol.Job objects to track when a job changes state and add methods to query the start / end times


来源:https://stackoverflow.com/questions/10119460/measure-hadoop-job-time-using-jobcontrol

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!