How can I tell how many mappers and reducers are running?

拥有回忆 提交于 2019-12-10 11:52:54

问题


I have a task that is designed to run dozens of map/reduce jobs. Some of them are IO intensive, some are mapper intensive, some are reducer intensive. I would like to be able to monitor the number of mappers and reducers currently in use so that, when a set of mappers is freed up, I can push another mapper intensive job to the cluster. I don't want to just stack them up on the queue because they might clog up the mapper and not let the reducer-intensive ones run.

Is there a command line interface I can call to get this information from (for instance) a Python script?


回答1:


Hadoop Job status can be accessed by following ways.

  • Hadoop jobs can be administrated through the hadoop web UI.

    Jobracker shows the jobs detail and default port is 50030 (localhost:50030 in pseudo mode

    Tasktrackers shows the individual map/ reduce tasks and it is available at the default port 50060.

  • Hadoop provides a REST API to access the cluster, nodes, applications, and application historical information.

    This REST API can be called from Python script also to get the application status. http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html




回答2:


I discovered that

mapred job -list

will list all of the jobs currently running, and

mapred job -status <job_id>

will provide the number of mappers and reducers for each job.



来源:https://stackoverflow.com/questions/27160390/how-can-i-tell-how-many-mappers-and-reducers-are-running

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!