How do I reduce the size of a bloated Git repo by non-interactively squashing all commits except for the most recent ones?

前端 未结 3 1283
萌比男神i
萌比男神i 2020-12-19 04:03

My Git repo has hundreds of gigabytes of data, say, database backups, so I\'m trying to remove old, outdated backups, because they\'re making everything larger and slower. S

相关标签:
3条回答
  • 2020-12-19 04:36

    An XY Problem

    Note that the original poster has an XY problem, where he's trying to figure out how to squash his older commits (the Y problem), when his real problem is actually trying to reduce the size of his Git repository (the X problem), as I've mentioned in the comments:

    Having a lot of commits won't necessarily bloat the size of your Git repo. Git is very efficient at compressing text-based files. Are you sure that the number of commits is the actual problem that leads to your large repo size? A more likely candidate is that you have too many binary assets versioned, which Git doesn't compress as well (or at all) compared to plain text files.

    Despite this, for the sake of completeness, I will also add an alternative solution to Matt McNabb's answer to the Y problem.

    Squashing (Hundreds or Thousands) of Old Commits

    As the original poster has already noted, using an interactive rebase with the --root flag can be impractical when there are many commits (numbering in the hundreds or thousands), particularly since the interactive rebase won't run efficiently on such a large number of them.

    As Matt McNabb pointed out in his answer, one solution is to use an orphan branch as a new (squashed) root, then to rebase on top of that. Another solution is to use a couple of various resets of the branch to achieve the same effect:

    # Save the current state of the branch in a couple of other branches
    git branch beforeReset
    git branch verification
    
    # Also mark where we want to start squashing commits
    git branch oldBase <most_recent_commit_to_squash>
    
    # Temporarily remove the most recent commits from the current branch,
    # because we don't want to squash those:
    git reset --hard oldBase
    
    # Using a soft reset to the root commit will keep all of the changes
    # staged in the index, so you just need to amend those changes to the
    # root commit:
    git reset --soft <root_commit>
    git commit --amend
    
    # Rebase onto the new amended root,
    # starting from oldBase and going up to beforeReset
    git rebase --onto master oldBase beforeReset
    
    # Switch back to master and (fast-forward) merge it with beforeReset
    git checkout master
    git merge beforeReset
    
    # Verify that master still contains the same state as before all of the resets
    git diff verification
    
    # Cleanup
    git branch -D beforeReset oldBase verification
    
    # As part of cleanup, since the original poster mentioned that
    # he has a lot of commits that he wants to remove to reduce
    # the size of his repo, garbage collect the old, dangling commits too
    git gc --prune=all
    

    The --prune=all option to git gc will ensure that all dangling commits are garbage collected, not only just the ones that are older than 2 weeks, which is the default setting for git gc.

    0 讨论(0)
  • 2020-12-19 04:50

    The original poster comments:

    if we take a snapshot of a commit 10004, remove all commits before it, and make commit 10004 a root commit, I'll be just fine

    One way to do this is here, assuming your current work is called branchname. I like to use a temp tag whenever I do a large rebase to double-check that there were no changes and to mark a point I can reset back to if something goes wrong (not sure if this is standard procedure or not but it works for me):

    git tag temp
    
    git checkout 10004
    git checkout --orphan new_root
    git commit -m "set new root 10004"
    
    git rebase --onto new_root 10004 branchname
    
    git diff temp   # verification that it worked with no changes
    git tag -d temp
    git branch -D new_root
    

    To get rid of the old branch you'll need to delete all tags and branch tags on it; then

    git prune
    git gc
    

    will clean it from your repo.

    Note that you'll temporarily have two copies of everything, until you have gc'd, but that is unavoidable; even if you do a standard squash and rebase you still have two copies of everything until the rebase finishes.

    0 讨论(0)
  • 2020-12-19 04:50

    Fastest counting implementation time is almost certainly going to be with grafts and a filter-branch, though you might be able to get faster execution with a handrolled commit-tree sequence working off rev-list output.

    Rebase is built to apply changes on different content. What you're doing here is preserving contents and intentionally losing the change history that produced them, so pretty much all of rebase's most tedious and slow work is wasted.

    The payload here is, working from your picture,

    echo `git rev-parse H; git rev-parse A` > .git/info/grafts  
    git filter-branch -- --all
    

    Documentation for git rev-parse and git filter-branch.

    Filter-branch is very careful to be recoverable after a failure at any point, which is certainly safest .... but it's only really helpful when recovery by simply redoing it wouldn't be faster and easier if things go south on you. Failures being rare and restarts usually being cheap, the thing to do is to do an un"safe" but very fast operation that is all but certain to work. For that, the best option here is to do it on a tmpfs (the closest equivalent I know on Windows would be a ramdisk like ImDisk), which will be blazing fast and won't touch your main repo until you're sure you've got the results you want.

    So on Windows, say T:\wip is on a ramdisk, and note that the clone here copies nothing. As well as reading the docs on git clone's --shared option, do examine the clone's innards to see the real effect, it's very straightforward.

    # switch to a lightweight wip clone on a tmpfs
    git clone --shared --no-checkout . /t/wip/filterwork
    cd !$
    
    # graft out the unwanted commits
    echo `git rev-parse $L; git rev-parse $A` >.git/info/grafts
    git filter-branch -- --all
    
    # check that the repo history looks right
    git log --graph --decorate --oneline --all
    
    # all done with the splicing, filter-branch has integrated it
    rm .git/info/grafts
    
    # push the rewritten histories back
    git push origin --all --force
    

    There are enough possible variations on what you might be wanting to do and what might be in your repo that almost any of the options on these commands might be useful. The above is tested and will do what it says it does, but that might not be exactly what you want.

    0 讨论(0)
提交回复
热议问题