问题
I have a Fully-Distributed Hadoop cluster with 4 nodes.When I submit my job to Jobtracker which decide 12 map tasks will be cool for my job,something strange happens.The 12 map tasks always running on a single node instead of running on the entire cluster.Before I ask the question ,I have already done the things below:
- Try different Job
- Run start-balance.sh to rebalance the cluster
But it does not work,so I hope someone can tell me why and how to fix it.
回答1:
If all the blocks of input data files are in that node, the scheduler with prioritize the same node
回答2:
Apparently the source data files is in one data node now. It could't be the balancer's fault. From what I can see, your hdfs must only have one replication or you are not in a Fully-Distributed Hadoop cluster.
回答3:
Check how your input is being split. You may only have one input split, meaning that only one Node will be used to process the data. You can test this by adding more input files to your stem and placing them on different nodes, then checking which nodes are doing the work.
If that doesn't work, check to make sure that your cluster is configured correctly. Specifically, check that your name node has paths to your other nodes set in its slaves
file, and that each slave node has your name node set in its masters
file.
来源:https://stackoverflow.com/questions/12135949/why-map-task-always-running-on-a-single-node