why map task always running on a single node

问题

I have a Fully-Distributed Hadoop cluster with 4 nodes.When I submit my job to Jobtracker which decide 12 map tasks will be cool for my job,something strange happens.The 12 map tasks always running on a single node instead of running on the entire cluster.Before I ask the question ,I have already done the things below:

Try different Job
Run start-balance.sh to rebalance the cluster

But it does not work,so I hope someone can tell me why and how to fix it.

回答1:

If all the blocks of input data files are in that node, the scheduler with prioritize the same node

回答2:

Apparently the source data files is in one data node now. It could't be the balancer's fault. From what I can see, your hdfs must only have one replication or you are not in a Fully-Distributed Hadoop cluster.

回答3:

Check how your input is being split. You may only have one input split, meaning that only one Node will be used to process the data. You can test this by adding more input files to your stem and placing them on different nodes, then checking which nodes are doing the work.

If that doesn't work, check to make sure that your cluster is configured correctly. Specifically, check that your name node has paths to your other nodes set in its slaves file, and that each slave node has your name node set in its masters file.

来源：https://stackoverflow.com/questions/12135949/why-map-task-always-running-on-a-single-node

标签

Hadoop

MapReduce

HDFS

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!