why is the output of `du` often so different from `du -b`

后端 未结 5 2062
温柔的废话
温柔的废话 2020-11-29 05:47

why is the output of du often so different from du -b? -b is shorthand for --apparent-size --block-size=1. only using

5条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-11-29 06:30

    Minimal block granularity example

    Let's play a bit to see what is going on.

    mount tells me I'm on an ext4 partition mounted at /.

    I find its block size with:

    stat -fc %s .
    

    which gives:

    4096
    

    Now let's create some files with sizes 1 4095 4096 4097:

    #!/usr/bin/env bash
    for size in 1 4095 4096 4097; do
      dd if=/dev/zero of=f bs=1 count="${size}" status=none
      echo "size     ${size}"
      echo "real     $(du --block-size=1 f)"
      echo "apparent $(du --block-size=1 --apparent-size f)"
      echo
    done
    

    and the results are:

    size     1
    real     4096   f
    apparent 1      f
    
    size     4095
    real     4096   f
    apparent 4095   f
    
    size     4096
    real     4096   f
    apparent 4096   f
    
    size     4097
    real     8192   f
    apparent 4097   f
    

    So we see that anything below or equal to 4096 takes up 4096 bytes in fact.

    Then, as soon as we cross 4097, it goes up to 8192 which is 2 * 4096.

    It is clear then that the disk always stores data at a block boundary of 4096 bytes.

    What happens to sparse files?

    I haven't investigated what is the exact representation is, but it is clear that --apparent does take it into consideration.

    This can lead to apparent sizes being larger than actual disk usage.

    For example:

    dd seek=1G if=/dev/zero of=f bs=1 count=1 status=none
    du --block-size=1 f
    du --block-size=1 --apparent f
    

    gives:

    8192    f
    1073741825      f
    

    Related: How to test if sparse file is supported

    What to do if I want to store a bunch of small files?

    Some possibilities are:

    • use a database instead of filesystem: Database vs File system storage
    • use a filesystem that supports block suballocation

    Bibliography:

    • https://serverfault.com/questions/565966/which-block-sizes-for-millions-of-small-files
    • https://askubuntu.com/questions/641900/how-file-system-block-size-works

    Tested in Ubuntu 16.04.

提交回复
热议问题