Hadoop safemode recovery - taking lot of time

六眼飞鱼酱① 提交于 2019-12-06 22:16:34

The time spent in safe mode is usually proportional to the size of the cluster. That said, normal time is on the order of minutes at most, not hours. There are a few things to check.

  1. Confirm all data nodes are firing up correctly. It's normal for data nodes to take a few seconds or minutes for a large number of blocks to report in. Check the data node logs to see what's happening during start up.
  2. Ensure you have enough name node threads (dfs.namenode.handler.count in hdfs-site.xml) to be able to take care of the number of data nodes that want to check in. The default is 10 which should be fine for clusters up to 20 nodes or so. Beyond that, it may make sense to increase this. You may see retries occurring in the data node logs that would indicate this. This is what the retry messages seems to indicate to me (e.g. retry 21 times).

Hope this helps.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!