HDFS - Block Size Related

这一生的挚爱 提交于 2020-01-17 06:07:28

问题


I have files of 10 MB size only.I think in HDFS first file consumes 10 MB and rest 54 MB is freed up to ad to the available space. My question is -

  1. The second file of 10 MB (or the next sequence of 10 MB files) would keep on adding to this until it becomes 64 MB? For e.g. -In total if we consume 2 blocks of 64 MB each and 20 MB of 3rd block, then the input split will give 3 outputs 2 64MB and 1 20MB ? Is it true?

回答1:


With Reference from Hadoop - The Definitive Guide:

HDFS stores small files inefficiently, since each file is stored in a block, and block metadata is held in memory by the namenode. Thus, a large number of small files can eat up a lot of memory on the namenode. (Note, however, that small files do not take up any more disk space than is required to store the raw contents of the file. For example, a 1 MB file stored with a block size of 128 MB uses 1 MB of disk space, not 128 MB.)

So you are right about, "HDFS first file consumes 10 MB and rest 54 MB is freed up to ad to the available space."

However, HDFS blocks are not a physical storage allocation unit, but a logical storage allocation unit.So its not like it would keep on adding to this block until it becomes 64 MB or the block size.(As the freed up disk space is added to available storage)

Number of mappers depends on number of inputsplits and Job Client computes input splits on the data located in the input path on the HDFS specified while running the job. so as per your example it will create 3 inputsplits, 2 64MB and 1 20MB(Assuming default HDFS block size).




回答2:


Block size(64MB or 128MB) referring that maximum value of your file split size. Even though your file size is less then 64 means it will consider as block/split.

Consider Block size as 64MB, then if suppose you going to save 10MB file means it will take only 10MB, that is the block/spilt of your file. If suppose you going to save 70MB file means your file will split it as 64MB and 6MB blocks/splits in storage. There is nothing like block should have 64MB or 128MB.



来源:https://stackoverflow.com/questions/33513782/hdfs-block-size-related

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!