问题
I'm newbie in HDFS, so sorry if my question is so naive.
Suppose we store files in a Hadoop cluster. Some files are really popular and will be requested very often(but not so often to put them in memory) than the other. It worth to keep more copies(replicas) of that files.
Can I implement it in HDFS or is there any best practice to tackle this task?
回答1:
Yes, you can do it for entire cluster/directory/file individually.
You can change the replication factor(lets say 3) on a per-file basis using the Hadoop FS shell.
[sys@localhost ~]$ hadoop fs –setrep –w 3 /my/file
Alternatively, you can change the replication factor(lets say 3) of all the files under a directory.
[sys@localhost ~]$ hadoop fs –setrep –w 3 -R /my/dir
To change replication of entire HDFS to 1:
[sys@localhost ~]$ hadoop fs -setrep -w 1 -R /
But the replication factor should lie between dfs.replication.max and dfs.replication.min value.
来源:https://stackoverflow.com/questions/37111653/hdfs-can-i-specify-replication-factor-per-file-to-increase-avaliability