shuffle error:exceeded max_failed_unique_matche : bailing out

♀尐吖头ヾ 提交于 2020-01-11 03:59:07

问题


I am new to hadoop and i am trying to execute the wordcount example. I have a cluster of 4 nodes made by virtual machines on my computer. Every time the job completes the map task but the reduce task at time about 16% shows this error:

Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

12/05/24 04:43:12 WARN mapred.JobClient: Error reading task outputmachine3-VirtualBox

It looks like the slaves are unable to retrieve data from other slaves. On some links I found that it can come due to inconsistency in /etc/hosts file. But I have cross-checked them, and they are all consistent. Can anyone help me out


回答1:


Is there a firewall preventing communications between the cluster nodes on the common hadoop ports (50060 for the task tracker in this case). Test by doing a curl from one node to another on port 50060 and check you get an http response code:

curl -I http://node1:50060/

Be sure to replace 'node1' in the above with each of the values in the $HADOOP_HOME/conf/slaves file

EDIT So it turns out this is most probably a DNS problem, here's what you should try:

  • Examine the ${HADOOP_HOME}/conf/slaves file - each entry in here needs to be in the /etc/hosts file for each node in your cluster, or you must have them in your networks DNS server
  • Once you've asserted the hosts file ON EVERY NODE in your cluster (or configured your DNS server), log into each node and check that you can ping the other cluster nodes by the names in the slaves file. Finally assert you can curl the tasktracker (port 50060) from each node to the other nodes (again using the machine names in the slaves file)
  • Restart your mapreduce services, just to be safe



回答2:


Check hostname in each node by typing $hostname in terminal. Ensure that you get the same name of your machine (master in master node and slave in slave node). If not, change /etc/hostname with your node name (master/slave). Then restart the system. It will works.

SIMPLE Groups



来源:https://stackoverflow.com/questions/10729543/shuffle-errorexceeded-max-failed-unique-matche-bailing-out

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!