How can I tell how many mappers and reducers are running?

问题

I have a task that is designed to run dozens of map/reduce jobs. Some of them are IO intensive, some are mapper intensive, some are reducer intensive. I would like to be able to monitor the number of mappers and reducers currently in use so that, when a set of mappers is freed up, I can push another mapper intensive job to the cluster. I don't want to just stack them up on the queue because they might clog up the mapper and not let the reducer-intensive ones run.

Is there a command line interface I can call to get this information from (for instance) a Python script?

回答1:

Hadoop Job status can be accessed by following ways.

Hadoop jobs can be administrated through the hadoop web UI.

Jobracker shows the jobs detail and default port is 50030 (localhost:50030 in pseudo mode

Tasktrackers shows the individual map/ reduce tasks and it is available at the default port 50060.
Hadoop provides a REST API to access the cluster, nodes, applications, and application historical information.

This REST API can be called from Python script also to get the application status. http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html