data block size in HDFS, why 64MB?

后端 未结 8 1454
既然无缘
既然无缘 2020-11-28 21:43

The default data block size of HDFS/hadoop is 64MB. The block size in disk is generally 4KB. What does 64MB block size mean? ->Does it mean that the smallest unit of read fr

8条回答
  •  春和景丽
    2020-11-28 22:32

    The reason Hadoop chose 64MB was because Google chose 64MB. The reason Google chose 64MB was due to a Goldilocks argument.

    Having a much smaller block size would cause seek overhead to increase.

    Having a moderately smaller block size makes map tasks run fast enough that the cost of scheduling them becomes comparable to the cost of running them.

    Having a significantly larger block size begins to decrease the available read parallelism available and may ultimately make it hard to schedule tasks local to the tasks.

    See Google Research Publication: MapReduce http://research.google.com/archive/mapreduce.html

提交回复
热议问题