Someone accidentally committed some large (multi-GB) binaries to my self-hosted gitlab repository, and now every time someone tries to pull from the repository the server ge
Had the same problem and the process to get it resolved was quite involved.
We run the community-maintained sameersbn/gitlab 11.4.5 in a Docker container. I didn't want to install bfg
there, but opted to perform the changes locally.
# Install the bfg tool, ex. on MacOS via homebrew
brew install bfg
# Clone repo locally
cd ~/Development
git clone --mirror ssh://git@server.com:22/some/dir/myrepo.git
# Clean the repo
bfg --delete-files \*.pdf myrepo.git
cd myrepo.git
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now
# Upload to container-host, e.g. via FileZilla
# Connect to the container-host via ssh
# Rename the original directory in the container, to have a backup
docker exec -it gitlab /bin/bash
mv /home/git/data/repositories/some/dir/myrepo.git /home/git/data/repositories/some/dir/myrepo.git.mybackup
exit
# Copy from container-host into container
docker cp /root/Documents/myrepo.git gitlab:/home/git/data/repositories/some/dir/myrepo.git
# Fix permissions in container
docker exec -it gitlab /bin/bash
cd /home/git/data/repositories/some/dir/myrepo.git
find . -type f -print0 | xargs -0 chown git:git
chown -R git:git /home/git/data/repositories/some/dir/myrepo.git
chmod 770 /home/git/data/repositories/some/dir/myrepo.git
# Re-create the "hooks" subdir with some symlinks in the repo
cd /home/git/gitlab/bin
./rake gitlab:shell:create_hooks
# Clear Redis cache (unclear if needed)
./rake cache:clear
exit
# Clone the changed repo locally again, also tell everyone who got a copy to clone again (history is broken now)
# Then do a commit to the repo, to hit the hook and trigger a size recheck
To do this, you will break the history of the repositories of any one that had pushed from this commit. You will have to tell them.
What you need is to rebase your remote repository and remove this commit.
First, rebase in your repository.
git rebase -i problematicCommit~1
This will open your default editor. Remove the line of the commit problematicCommit. Save the file and close it.
Remove the branch in your remote repository.
git push origin :nameOfTheBranch
Look the dots before the name of the branch.
Finally, create again the branch in the remote.
git push origin nameOfTheBranch
This regenerate the branch in the remote without the conflictive commit and the new clones will be fast again.
Now, If you still notice that your repository is going slow. You can erase the untracked objects (e.g. the ones with this big file) that it has.
First, remove all tags, branches that could be pointing to the old commits. This is important because to be able to erase old commits, they must be untracked.
Then, following the VonC comment stackoverflow.com/a/28720432/6309 - Do in your repository and in the remote:
git gc
git repack -Ad
git prune
As the OP Karl confirms in the comments, running BFG repo cleaner on the server side (directly in the bare repo) is enough to remove the large binaries.
If you follow that with (as mentioned in "Git - Delete a Blob"):
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now
But also ("git gc --aggressive vs git repack"):
git gc
git repack -Ad # kills in-pack garbage
git prune # kills loose garbage
You should end up with a slimmer and smaller bare repo.