Clean up large files on git server

前端 未结 3 778
傲寒
傲寒 2020-12-17 03:40

Someone accidentally committed some large (multi-GB) binaries to my self-hosted gitlab repository, and now every time someone tries to pull from the repository the server ge

相关标签:
3条回答
  • 2020-12-17 03:53

    Had the same problem and the process to get it resolved was quite involved.

    We run the community-maintained sameersbn/gitlab 11.4.5 in a Docker container. I didn't want to install bfg there, but opted to perform the changes locally.

    # Install the bfg tool, ex. on MacOS via homebrew
    brew install bfg
    
    # Clone repo locally
    cd ~/Development
    git clone --mirror ssh://git@server.com:22/some/dir/myrepo.git
    
    # Clean the repo
    bfg --delete-files \*.pdf myrepo.git
    cd myrepo.git
    rm -rf .git/refs/original/
    git reflog expire --expire=now --all
    git gc --prune=now
    git gc --aggressive --prune=now
    
    # Upload to container-host, e.g. via FileZilla
    
    # Connect to the container-host via ssh
    
    # Rename the original directory in the container, to have a backup
    docker exec -it gitlab /bin/bash
    mv /home/git/data/repositories/some/dir/myrepo.git /home/git/data/repositories/some/dir/myrepo.git.mybackup
    exit
    
    # Copy from container-host into container
    docker cp /root/Documents/myrepo.git gitlab:/home/git/data/repositories/some/dir/myrepo.git
    
    # Fix permissions in container
    docker exec -it gitlab /bin/bash
    cd /home/git/data/repositories/some/dir/myrepo.git
    find . -type f -print0 | xargs -0 chown git:git
    chown -R git:git /home/git/data/repositories/some/dir/myrepo.git
    chmod 770 /home/git/data/repositories/some/dir/myrepo.git
    
    # Re-create the "hooks" subdir with some symlinks in the repo
    cd /home/git/gitlab/bin
    ./rake gitlab:shell:create_hooks
    
    # Clear Redis cache (unclear if needed)
    ./rake cache:clear
    exit
    
    # Clone the changed repo locally again, also tell everyone who got a copy to clone again (history is broken now)
    
    # Then do a commit to the repo, to hit the hook and trigger a size recheck
    
    0 讨论(0)
  • 2020-12-17 04:07

    To do this, you will break the history of the repositories of any one that had pushed from this commit. You will have to tell them.

    What you need is to rebase your remote repository and remove this commit.

    First, rebase in your repository.

    git rebase -i problematicCommit~1
    

    This will open your default editor. Remove the line of the commit problematicCommit. Save the file and close it.

    Remove the branch in your remote repository.

    git push origin :nameOfTheBranch
    

    Look the dots before the name of the branch.

    Finally, create again the branch in the remote.

    git push origin nameOfTheBranch
    

    This regenerate the branch in the remote without the conflictive commit and the new clones will be fast again.

    Now, If you still notice that your repository is going slow. You can erase the untracked objects (e.g. the ones with this big file) that it has.

    First, remove all tags, branches that could be pointing to the old commits. This is important because to be able to erase old commits, they must be untracked.

    Then, following the VonC comment stackoverflow.com/a/28720432/6309 - Do in your repository and in the remote:

    git gc
    git repack -Ad
    git prune
    
    0 讨论(0)
  • 2020-12-17 04:09

    As the OP Karl confirms in the comments, running BFG repo cleaner on the server side (directly in the bare repo) is enough to remove the large binaries.

    If you follow that with (as mentioned in "Git - Delete a Blob"):

    rm -rf .git/refs/original/
    git reflog expire --expire=now --all
    git gc --prune=now
    git gc --aggressive --prune=now
    

    But also ("git gc --aggressive vs git repack"):

    git gc
    git repack -Ad      # kills in-pack garbage
    git prune           # kills loose garbage
    

    You should end up with a slimmer and smaller bare repo.

    0 讨论(0)
提交回复
热议问题