问题
My Hadoop version is 2.5.2. I am changing my dfs.blocksize in hdfs-site.xml file on the master node. I have the following question:
1) Will this change affect the existing data in HDFS 2) Do I need to propogate this change to all he nodes in Hadoop cluster or only on the NameNode is sufficient
回答1:
you should be making changes in hdfs-site.xml of all slaves also... dfs.block size should be consistent accross all datanodes.
回答2:
1) Will this change affect the existing data in HDFS
No, it will not. It will keep the old block size on the old files. In order for it to take the new block change, you need to rewrite the data. You can either do a hadoop fs -cp
or a distcp
on your data. The new copy will have the new block size and you can delete your old data.
2) Do I need to propogate this change to all he nodes in Hadoop cluster or only on the NameNode is sufficient?
I believe in this case you only need to change the NameNode. However, this is a very very bad idea. You need to keep all of your configuration files in sync for a number of good reasons. When you get more serious about your Hadoop deployment, you should probably start using something like Puppet or Chef to manage your configs.
Also, note that whenever you change a configuration, you need to restart the NameNode and DataNodes in order for them to change their behavior.
Interesting note: you can set the blocksize of individual files as you write them to overwrite the default block size. E.g., hadoop fs -D fs.local.block.size=134217728 -put a b
回答3:
ochanging the block size in hdfs-site.xml will only affect the new data.
回答4:
which distribution you are using... by seeing your questions it looks like you are using apache distribution..easiest way i can find is write a shell script to first delete hdfs-site.xml in slaves like
ssh username@domain.com 'rm /some/hadoop/conf/hdfs-site.xml'
ssh username@domain2.com 'rm /some/hadoop/conf/hdfs-site.xml'
ssh username@domain3.com 'rm /some/hadoop/conf/hdfs-site.xml'
later copy the hdfs-site.xml from master to all the slaves
scp /hadoop/conf/hdfs-site.xml username@domain.com:/hadoop/conf/
scp /hadoop/conf/hdfs-site.xml username@domain2.com:/hadoop/conf/
scp /hadoop/conf/hdfs-site.xml username@domain3.com:/hadoop/conf/
来源:https://stackoverflow.com/questions/28586401/does-changing-the-value-of-dfs-blocksizeaffect-existing-data