Disk usage of files whose names match a regex, in Linux?

前端 未结 6 1612
耶瑟儿~
耶瑟儿~ 2020-12-23 21:19

So, in many situations I wanted a way to know how much of my disk space is used by what, so I know what to get rid of, convert to another format, store elsewhere (such as da

相关标签:
6条回答
  • 2020-12-23 21:47

    du is my favorite answer. If you have a fixed filesystem structure, you can use:

    du -hc *.bak
    

    If you need to add subdirs, just add:

    du -hc *.bak **/*.bak **/**/*.bak
    

    etc etc

    However, this isn't a very useful command, so using your find:

    TOTAL=0;for I in $(find . -name \*.bak); do  TOTAL=$((TOTAL+$(du $I | awk '{print $1}'))); done; echo $TOTAL
    

    That will echo the total size in bytes of all of the files you find.

    Hope that helps.

    0 讨论(0)
  • 2020-12-23 21:49

    The previous solutions didn't work properly for me (I had trouble piping du) but the following worked great:

    find path/to/directory -iregex ".*\.bak$" -exec du -csh '{}' + | tail -1
    

    The iregex option is a case insensitive regular expression. Use regex if you want it to be case sensitive.

    If you aren't comfortable with regular expressions, you can use the iname or name flags (the former being case insensitive):

    find path/to/directory -iname "*.bak" -exec du -csh '{}' + | tail -1
    

    In case you want the size of every match (rather than just the combined total), simply leave out the piped tail command:

    find path/to/directory -iname "*.bak" -exec du -csh '{}' +
    

    These approaches avoid the subdirectory problem in @MaddHackers' answer.

    Hope this helps others in the same situation (in my case, finding the size of all DLL's in a .NET solution).

    0 讨论(0)
  • 2020-12-23 21:52

    The accepted reply suggests to use

    find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1
    

    but that doesn't work on my system as du doesn't know a --files-0-from option on my system. Only GNU du knows that option, it's neither part of the POSIX Standard (so you won't find it in FreeBSD or macOS), nor will you find it on BusyBox based Linux systems (e.g. most embedded Linux systems) or any other Linux system that does not use the GNU du version.

    Then there's a reply suggesting to use:

    find path/to/directory -iregex .*\.bak$ -exec du -csh '{}' + | tail -1
    

    This solution will work as long as there aren't too many files found, as + means that find will try call du with as many hits as possible in a single call, however, there might be a maximum number of arguments (N) a system supports and if there are more hits than this value, find will call du multiple times, splitting the hits into groups smaller than or equal to N items each and this case the result will be wrong and only show the size of the last du call.

    Finally there is an answer using stat and awk, which is a nice way to do it, but it relies on shell globbing in a way that only Bash 4.x or later supports. It will not work with older versions and if it works with other shells is unpredictable.

    A POSIX conform solution (works on Linux, macOS and any BSD variants), that doesn't suffer by any limitation and that will surely work with every shell would be:

    find . -regex '.*\.bak' -exec stat -f "%z" {} \; | awk '{s += $1} END {print s}'
    
    0 讨论(0)
  • 2020-12-23 21:57

    If you're OK with glob-patterns and you're only interested in the current directory:

    stat -c "%s" *.bak | awk '{sum += $1} END {print sum}'
    

    or

    sum=0
    while read size; do (( sum += size )); done < <(stat -c "%s" *.bak)
    echo $sum
    

    The %s directive to stat gives bytes not kilobytes.

    If you want to descend into subdirectories, with bash version 4, you can shopt -s globstar and use the pattern **/*.bak

    0 讨论(0)
  • 2020-12-23 22:04

    Run this in a Bourne Shell to declare a function that calculates the sum of sizes of all the files matching a regex pattern in the current directory:

    sizeofregex() { IFS=$'\n'; for x in $(find . -regex "$1" 2> /dev/null); do du -sk "$x" | cut -f1; done | awk '{s+=$1} END {print s}' | sed 's/^$/0/'; unset IFS; }
    

    (Alternatively, you can put it in a script.)

    Usage:

    cd /where/to/look
    sizeofregex 'myregex'
    

    The result will be a number (in KiB), including 0 (if there are no files that match your regex).

    If you do not want it to look in other filesystems (say you want to look for all .so files under /, which is a mount of /dev/sda1, but not under /home, which is a mount of /dev/sdb1, add a -xdev parameter to find in the function above.

    0 讨论(0)
  • 2020-12-23 22:08

    I suggest something like: find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1

    Some notes:

    • The -print0 option for find and --files0-from for du are there to avoid issues with whitespace in file names
    • The regular expression is matched against the whole path, e.g. ./dir1/subdir2/file.bak, not just file.bak, so if you modify it, take that into account
    • I used h flag for du to produce a "human-readable" format but if you want to parse the output, you may be better off with k (always use kilobytes)
    • If you remove the tail command, you will additionally see the sizes of particular files and directories

    Sidenote: a nice GUI tool for finding out who ate your disk space is FileLight. It doesn't do regexes, but is very handy for finding big directories or files clogging your disk.

    0 讨论(0)
提交回复
热议问题