问题
I have a task that is designed to run dozens of map/reduce jobs. Some of them are IO intensive, some are mapper intensive, some are reducer intensive. I would like to be able to monitor the number of mappers and reducers currently in use so that, when a set of mappers is freed up, I can push another mapper intensive job to the cluster. I don't want to just stack them up on the queue because they might clog up the mapper and not let the reducer-intensive ones run.
Is there a command line interface I can call to get this information from (for instance) a Python script?
回答1:
Hadoop Job status can be accessed by following ways.
Hadoop jobs can be administrated through the hadoop web UI.
Jobracker shows the jobs detail and default port is 50030 (localhost:50030 in pseudo mode
Tasktrackers shows the individual map/ reduce tasks and it is available at the default port 50060.
Hadoop provides a REST API to access the cluster, nodes, applications, and application historical information.
This REST API can be called from Python script also to get the application status. http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
回答2:
I discovered that
mapred job -list
will list all of the jobs currently running, and
mapred job -status <job_id>
will provide the number of mappers and reducers for each job.
来源:https://stackoverflow.com/questions/27160390/how-can-i-tell-how-many-mappers-and-reducers-are-running