问题
I have created a 4-node hadoop cluster. I start all datanodes,namenode resource manager,etc.
To find whether all of my nodes are working or not, I tried the following procedure:
Step 1. I run my program when all nodes are active
Step 2. I run my program when only master is active.
The completion time in both cases were almost same.
So, I would like to know if there is any other means by which I can know how many nodes are actually used while running the program.
回答1:
Discussed in the chat. The problem is caused by incorrect Hadoop installation, in both cases job was started locally using LocalJobRunner.
As a recommendations:
- Install Hadoop using Ambari (http://ambari.apache.org/)
- Change platform to CentOS 6.4+
- Use Oracle JDK 7
- Be patient with host names and firewall
- Get familiar with the cluster commands for health diagnostics and default Hadoop WebUIs
来源:https://stackoverflow.com/questions/27028288/hadoop-cluster-is-using-only-master-node-or-all-nodes