change number of data nodes in Hadoop

问题

How to change the number of data nodes, that is disable and enable certain data nodes to test scalability? To be more clear, I have 4 data nodes, and I want to experiment the performance with 1, 2, 3 and 4 data nodes one by one. Would it be possible just updating slaves file in namenode?

回答1:

The correct way to temporarily decommission a node:

Create an "exclude file". This lists the hosts, one per line, that you wish to remove.
Set dfs.hosts.exclude and mapred.hosts.exclude to the location of this file.
Update the namenode and jobtracker by doing hadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes
This will start the decomissioning process. All of the data that used to be replicated on those nodes will be copied off of them and onto other nodes. You can check the progress through the web UI.

Note that those nodes will not be used for MR jobs as soon as you do hadoop mradmin -refreshNodes but they will still hold data, so you might eat some network latency that you wouldn't otherwise if you run something before decommissioning is complete. So for a totally realistic test, you should wait until it is finished.

To add the nodes back, simply remove them from the exclude file and do the -refreshNodes commands again.

回答2:

Slaves file is used only for scripts like start-dfs and can be ignored if you don't use those scripts. So you can leave it empty and add/remove datanodes to cluster just by turn then on/off.

来源：https://stackoverflow.com/questions/12508169/change-number-of-data-nodes-in-hadoop

标签

Hadoop

MapReduce