I wanted to find the 10 largest files in my repository. The script I came up with is as follows:
REP_HOME_DIR=
max_huge_files=
You can also use du
- Example: du -ah objects | sort -n -r | head -n 10
. du to get the size of the objects, sort
them and then picking the top 10 using head
.
You can use find
to find files larger than a given threshold, then pass them to git ls-files
to exclude untracked files (e.g. build output):
find * -type f -size +100M -print0 | xargs -0 git ls-files
Adjust 100M (100 megabytes) as needed until you get results.
Minor caveat: this won't search top-level "hidden" files and folders (i.e. those whose names start with .
). This is because I used find *
instead of just find
to avoid searching the .git
database.
I was having trouble getting the sort -n
solutions to work (on Windows under Git Bash). I'm guessing it's due to indentation differences when xargs batches arguments, which xargs -0
seems to do automatically to work around Windows' command-line length limit of 32767.
On Windows, I started with @pix64's answer (thanks!) and modified it to handle files with spaces in the path, and also to output objects instead of strings:
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype)|%(objectname)|%(objectsize)|%(rest)' |
Where-Object {$_ -like "blob*"} |
% { $tokens = $_ -split "\|"; [pscustomobject]@{ Hash = $tokens[1]; Size = [int]($tokens[2]); Name = $tokens[3] } } |
Sort-Object -Property Size -Descending |
Select-Object -First 50
Even better, if you want to output the file sizes with nice file size units, you can add the DisplayInBytes function from here to your environment https://stackoverflow.com/a/40887001/892770, and then pipe the above to:
Format-Table Hash, Name, @{Name="Size";Expression={ DisplayInBytes($_.Size) }}
This gives you output like:
Hash Name Size
---- ---- ----
f51371aa843279a1efe45ff14f3dc3ec5f6b2322 types/react-native-snackbar-component/react 95.8 MB
84f3d727f6b8f99ab4698da51f9e507ae4cd8879 .ntvs_analysis.dat 94.5 MB
17d734397dcd35fdbd715d29ef35860ecade88cd fhir/fhir-tests.ts 11.5 KB
4c6a027cdbce093fd6ae15e65576cc8d81cec46c fhir/fhir-tests.ts 11.4 KB
Lastly, if you'd like to get all the largest file types, you can do so with:
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype)|%(objectname)|%(objectsize)|%(rest)' |
Where-Object {$_ -like "blob*"} |
% { $tokens = $_ -split "\|"; [pscustomobject]@{ Size = [int]($tokens[2]); Extension = [System.IO.Path]::GetExtension($tokens[3]) } } |
Group-Object -Property Extension |
% { [pscustomobject]@{ Name = $_.Name; Size = ($_.Group | Measure-Object Size -Sum).Sum } } |
Sort-Object -Property Size -Descending |
select -First 20 -Property Name, @{Name="Size";Expression={ DisplayInBytes($_.Size) }}