What is a job history server in Hadoop and why is it mandatory to start the history server before starting Pig in Map Reduce mode?

一曲冷凌霜 提交于 2019-12-11 07:50:51

问题


Before starting Pig in map reduce mode you always have to start the history server else while trying to execute Pig Latin statements the below mentioned logs are generated:

  2018-10-18 15:59:13,709 [main] INFO 
  org.apache.hadoop.mapred.ClientServiceDelegate - Application state 
  is completed. FinalApplicationStatus=SUCCEEDED. **Redirecting to job 
  history server**

  2018-10-18 15:59:14,713 [main] INFO  org.apache.hadoop.ipc.Client - 
  Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 
  time(s); retry policy is 

  RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
  MILLISECONDS)

As shown in the above logs Pig Execution engine is trying to connect with the history server Please explain what is the role of job history server in Hadoop and why a connection needs to be made with the history server in Pig for a Map Reduce job


回答1:


JobTracker or ResourceManager keeps all job information in memory. For finished jobs, it drops them to avoid running out of memory. Tracking of these past jobs are delegated to JobHistory server.

Pig clients pulls job counter stats when its jobs are finished. Stats could still be with JobTracker/ResourceManager or pig may need to ask the JobHistory server. When JobHistory server is down, it prints out those log messages but eventually client should still succeed with missing stats.



来源:https://stackoverflow.com/questions/52872301/what-is-a-job-history-server-in-hadoop-and-why-is-it-mandatory-to-start-the-hist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!