问题
We have a big git repository, which I want to push to a self-hosted gitlab instance.
The problem is that the gitlab remote does not let me push my repo:
git push --mirror https://mygitlab/xy/myrepo.git
This will give me this error:
Enumerating objects: 1383567, done.
Counting objects: 100% (1383567/1383567), done.
Delta compression using up to 8 threads
Compressing objects: 100% (207614/207614), done.
remote: error: object c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867:
duplicateEntries: contains duplicate file entries
remote: fatal: fsck error in packed object
So I did a git fsck:
error in tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867: duplicateEntries: contains duplicate file entries
error in tree 0d7286cedf43c65e1ce9f69b74baaf0ca2b73e2b: duplicateEntries: contains duplicate file entries
error in tree 7f14e6474400417d11dfd5eba89b8370c67aad3a: duplicateEntries: contains duplicate file entries
Next thing I did was to check git ls-tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867
:
100644 blob c233c88b192acfc20548d9d9f0c81c48c6a05a66 fileA.cs
100644 blob 5d6096cb75d27780cdf6da8a3b4d357515f004e0 fileB.cs
100644 blob 5d6096cb75d27780cdf6da8a3b4d357515f004e0 fileB.cs
100644 blob d2a4248bcda39c0dc3827b495f7751b7cc06c816 fileC.xaml
Notice that fileB.cs
is displayed twice, with the same hash. I assume that this is the problem, because why would the file be two times in the same tree with the same file name and blob hash?
Now I googled the problem but could not find a way how to fix this. One seemingly good resource I found was this: Tree contains duplicate file entries
However, it basically comes down to using git replace which does not really fix the problem, so git fsck will still print the error and prevent me from pushing to the remote.
Then there is this one which seems to remove the file entirely (but I still need the file, but only once, not twice in the tree): https://stackoverflow.com/a/44672692/826244
Is there any other way to fix this? I mean it really should be possible to fix so that git fsck does not throw any errors, right? I am aware that I will need to rewrite the entire history after the corrupted commits. I could not even find a way to get the commit that points to the specific trees, otherwise I might be able to use rebase and patching the corrupted commit or something. Any help would be greatly appreciated!
UPDATE: Pretty sure I know what to do, but not yet how to do it:
- Creating a new tree object from the old tree, but corrected with
git mktree
<- done - Create a new commit that is identical to the old one that references the bad tree but with the newly fixed tree <- difficult, I cannot easily get the commit to the tree, my current solution runs like an hour or more and I do not know how to create the modified commit then, once I have found it
- Run
git filter-branch -- --all
<- Should persist the replacements of the commits
Sadly I cannot just use git replace --edit
on the bad tree and then run git filter-branch -- --all
because filter-branch
seems to only work on commits, but ignores tree-replaces...
回答1:
You can try running git fast-export
to export your repository into a data file, and then run git fast-import
to re-import the data file into a new repository. Git will remove any duplicate entries during the fast-import process, which will solve your problem.
Be aware that you may have to make a decision about how to handle signed tags and such when you export by passing appropriate arguments to git fast-export
; since you're rewriting history, you probably want to pass --signed-tags=strip
.
回答2:
The final solution was to write a tool that tackles this problem.
First step was to git unpack-objects all packfiles.
Then I had to identify the commits that pointed to the tree entries with duplicates by reading all refs and then walking back in history checking all the trees.
After I had the tools for that it was not so hard to now rewrite the trees of those commits and then rewriting all commits after that. After that I had to update the changed refs. This is the moment where I thoroughly tested the result as nothing was lost yet.
Finally a git reflog expire --expire=now --all && git gc --prune=now --aggressive
rewrote the pack and removed all loose objects that are not accessible anymore.
When I have the time I will upload the source code to github, as it performs really well and could be a template to similar problems. It ran only a few minutes on a 3.7GB repository (about 20GB unpacked). By now I also implemented reading from the packfiles, so no need to unpack anything anymore (which takes a lot of time and space).
Update: I worked a little more on the source and it now performs really well, even better than bfg for deleting a single file (no option switches yet). The source code is available here: https://github.com/TimHeinrich/GitRewrite Be aware, this was only tested against a single repository and only under windows on a core i7. It is highly unlikely that it will work on linux or with any other processor architecture
回答3:
You can delete the related refs and expire its objects.
In order to find the related refs run:
$ git log --all --format=raw --raw -t --no-abbrev
and search for the change sha, then find it in $ git show-refs
Next, for each ref holding the bad objects do:
$ git update-ref -d refs/changes/xx/xxxxxx/x
Finally expire the objects and run fsck, it should be fixed.
$ git reflog expire --expire=now --all
$ git gc --prune=now --aggressive
$ git fsck
回答4:
I found an issue related with gitlab not having fsck.skipList
and I think the solution may apply:
In order to push to a new project in gitlab, the guy used the import feature when creating that GitLab project, and had it import straight from his other repo.
Note: It didn't fix it local, but allowed to import it and maybe importing that way had generated a clean branch remotely.
来源:https://stackoverflow.com/questions/56344067/git-fsck-duplicateentries-contains-duplicate-file-entries-cannot-push-to-git