How to find the N largest files in a git repository?

后端 未结 9 1681
青春惊慌失措
青春惊慌失措 2020-12-12 14:33

I wanted to find the 10 largest files in my repository. The script I came up with is as follows:

REP_HOME_DIR=
max_huge_files=         


        
相关标签:
9条回答
  • 2020-12-12 15:05

    You can also use du - Example: du -ah objects | sort -n -r | head -n 10 . du to get the size of the objects, sort them and then picking the top 10 using head.

    0 讨论(0)
  • 2020-12-12 15:11

    You can use find to find files larger than a given threshold, then pass them to git ls-files to exclude untracked files (e.g. build output):

    find * -type f -size +100M -print0 | xargs -0 git ls-files
    

    Adjust 100M (100 megabytes) as needed until you get results.

    Minor caveat: this won't search top-level "hidden" files and folders (i.e. those whose names start with .). This is because I used find * instead of just find to avoid searching the .git database.

    I was having trouble getting the sort -n solutions to work (on Windows under Git Bash). I'm guessing it's due to indentation differences when xargs batches arguments, which xargs -0 seems to do automatically to work around Windows' command-line length limit of 32767.

    0 讨论(0)
  • 2020-12-12 15:12

    On Windows, I started with @pix64's answer (thanks!) and modified it to handle files with spaces in the path, and also to output objects instead of strings:

    git rev-list --objects --all |
     git cat-file --batch-check='%(objecttype)|%(objectname)|%(objectsize)|%(rest)' |
     Where-Object {$_ -like "blob*"} |
     % { $tokens = $_ -split "\|"; [pscustomobject]@{ Hash = $tokens[1]; Size = [int]($tokens[2]); Name = $tokens[3] } } |
     Sort-Object -Property Size -Descending |
     Select-Object -First 50
    

    Even better, if you want to output the file sizes with nice file size units, you can add the DisplayInBytes function from here to your environment https://stackoverflow.com/a/40887001/892770, and then pipe the above to:

    Format-Table Hash, Name, @{Name="Size";Expression={ DisplayInBytes($_.Size) }}
    

    This gives you output like:

    Hash                                     Name                                        Size
    ----                                     ----                                        ----
    f51371aa843279a1efe45ff14f3dc3ec5f6b2322 types/react-native-snackbar-component/react 95.8 MB
    84f3d727f6b8f99ab4698da51f9e507ae4cd8879 .ntvs_analysis.dat                          94.5 MB
    17d734397dcd35fdbd715d29ef35860ecade88cd fhir/fhir-tests.ts                          11.5 KB
    4c6a027cdbce093fd6ae15e65576cc8d81cec46c fhir/fhir-tests.ts                          11.4 KB
    

    Lastly, if you'd like to get all the largest file types, you can do so with:

    git rev-list --objects --all |
     git cat-file --batch-check='%(objecttype)|%(objectname)|%(objectsize)|%(rest)' |
     Where-Object {$_ -like "blob*"} |
     % { $tokens = $_ -split "\|"; [pscustomobject]@{ Size = [int]($tokens[2]); Extension = [System.IO.Path]::GetExtension($tokens[3]) } } |
     Group-Object -Property Extension |
     % { [pscustomobject]@{ Name = $_.Name; Size = ($_.Group | Measure-Object Size -Sum).Sum } } |
     Sort-Object -Property Size -Descending |
     select -First 20 -Property Name, @{Name="Size";Expression={ DisplayInBytes($_.Size) }}
    
    0 讨论(0)
提交回复
热议问题