change number of data nodes in Hadoop

时光总嘲笑我的痴心妄想 提交于 2020-01-05 07:12:29

问题


How to change the number of data nodes, that is disable and enable certain data nodes to test scalability? To be more clear, I have 4 data nodes, and I want to experiment the performance with 1, 2, 3 and 4 data nodes one by one. Would it be possible just updating slaves file in namenode?


回答1:


The correct way to temporarily decommission a node:

  1. Create an "exclude file". This lists the hosts, one per line, that you wish to remove.
  2. Set dfs.hosts.exclude and mapred.hosts.exclude to the location of this file.
  3. Update the namenode and jobtracker by doing hadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes
  4. This will start the decomissioning process. All of the data that used to be replicated on those nodes will be copied off of them and onto other nodes. You can check the progress through the web UI.

Note that those nodes will not be used for MR jobs as soon as you do hadoop mradmin -refreshNodes but they will still hold data, so you might eat some network latency that you wouldn't otherwise if you run something before decommissioning is complete. So for a totally realistic test, you should wait until it is finished.

To add the nodes back, simply remove them from the exclude file and do the -refreshNodes commands again.




回答2:


Slaves file is used only for scripts like start-dfs and can be ignored if you don't use those scripts. So you can leave it empty and add/remove datanodes to cluster just by turn then on/off.



来源:https://stackoverflow.com/questions/12508169/change-number-of-data-nodes-in-hadoop

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!