Does changing the value of dfs.blocksizeaffect existing data

随声附和 提交于 2020-01-25 20:37:11

问题


My Hadoop version is 2.5.2. I am changing my dfs.blocksize in hdfs-site.xml file on the master node. I have the following question:

1) Will this change affect the existing data in HDFS 2) Do I need to propogate this change to all he nodes in Hadoop cluster or only on the NameNode is sufficient


回答1:


you should be making changes in hdfs-site.xml of all slaves also... dfs.block size should be consistent accross all datanodes.




回答2:


1) Will this change affect the existing data in HDFS

No, it will not. It will keep the old block size on the old files. In order for it to take the new block change, you need to rewrite the data. You can either do a hadoop fs -cp or a distcp on your data. The new copy will have the new block size and you can delete your old data.

2) Do I need to propogate this change to all he nodes in Hadoop cluster or only on the NameNode is sufficient?

I believe in this case you only need to change the NameNode. However, this is a very very bad idea. You need to keep all of your configuration files in sync for a number of good reasons. When you get more serious about your Hadoop deployment, you should probably start using something like Puppet or Chef to manage your configs.

Also, note that whenever you change a configuration, you need to restart the NameNode and DataNodes in order for them to change their behavior.

Interesting note: you can set the blocksize of individual files as you write them to overwrite the default block size. E.g., hadoop fs -D fs.local.block.size=134217728 -put a b




回答3:


ochanging the block size in hdfs-site.xml will only affect the new data.




回答4:


which distribution you are using... by seeing your questions it looks like you are using apache distribution..easiest way i can find is write a shell script to first delete hdfs-site.xml in slaves like

ssh username@domain.com 'rm /some/hadoop/conf/hdfs-site.xml'
ssh username@domain2.com 'rm /some/hadoop/conf/hdfs-site.xml'
ssh username@domain3.com 'rm /some/hadoop/conf/hdfs-site.xml'

later copy the hdfs-site.xml from master to all the slaves

scp /hadoop/conf/hdfs-site.xml username@domain.com:/hadoop/conf/ 
scp /hadoop/conf/hdfs-site.xml username@domain2.com:/hadoop/conf/ 
scp /hadoop/conf/hdfs-site.xml username@domain3.com:/hadoop/conf/ 


来源:https://stackoverflow.com/questions/28586401/does-changing-the-value-of-dfs-blocksizeaffect-existing-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!