Namenode file quantity limit

前端 未结 3 1201
礼貌的吻别
礼貌的吻别 2020-12-05 16:40

Any one know how many bytes occupy per file in namenode of Hdfs? I want to estimate how many files can store in single namenode of 32G memory.

3条回答
  •  天命终不由人
    2020-12-05 17:16

    Cloudera recommends 1 GB of NameNode heap space per million blocks. 1 GB for every million files is less conservative but should work too.

    Also you don't need to multiply by a replication factor, an accepted answer is wrong.

    Using the default block size of 128 MB, a file of 192 MB is split into two block files, one 128 MB file and one 64 MB file. On the NameNode, namespace objects are measured by the number of files and blocks. The same 192 MB file is represented by three namespace objects (1 file inode + 2 blocks) and consumes approximately 450 bytes of memory.

    One data file of 128 MB is represented by two namespace objects on the NameNode (1 file inode + 1 block) and consumes approximately 300 bytes of memory. By contrast, 128 files of 1 MB each are represented by 256 namespace objects (128 file inodes + 128 blocks) and consume approximately 38,400 bytes.

    Replication affects disk space but not memory consumption. Replication changes the amount of storage required for each block but not the number of blocks. If one block file on a DataNode, represented by one block on the NameNode, is replicated three times, the number of block files is tripled but not the number of blocks that represent them.

    Examples:

    1. 1 x 1024 MB file 1 file inode 8 blocks (1024 MB / 128 MB) Total = 9 objects * 150 bytes = 1,350 bytes of heap memory
    2. 8 x 128 MB files 8 file inodes 8 blocks Total = 16 objects * 150 bytes = 2,400 bytes of heap memory
    3. 1,024 x 1 MB files 1,024 file inodes 1,024 blocks Total = 2,048 objects * 150 bytes = 307,200 bytes of heap memory

    Even more examples article in the origin article from cloudera.

提交回复
热议问题