Moving large number of large files in git repository

自作多情 提交于 2019-12-02 01:08:27

问题


My repository has large number of large files. They are mostly data (text). Sometimes, I need to move these files to another location due to refactoring or packaging.

I use git mv command to "rename" the path of the files, but it seems inefficient in that the size of the commit (the actual diff size) is very huge, same as rm, git add

Is there other ways to reduce the commit size? or should I just add them to .gitignore and upload as a zip file to upstream?


Thank you for the answers.

FYI, following series of commands will result the size of the file bar

git mv foo bar
git commit -m "modify"
git cat-file -s HEAD:bar

from which I thought git did rm and add. Would you tell me if this info is not related to the actual size or not?


回答1:


By design, if you move a file inside a Git repository without changing content, creating a commit will only store new metadata (a.k.a. tree objects) to represent new file location. Since content is unchanged, Git doesn't need to create new blob object to store file content. So "commit size" should be rather small.

Since you say that diff size is huge, I suppose that some file content is modified along with relocation. This would be a reason for "commit size" to be huge.

In both case, you can try to shrink .git directory size with the command git gc --prune --aggressive

EDIT :

git mv foo bar
git commit -m "modify"
git cat-file -s HEAD:bar

These commands create a new commit, but the since the foo/bar file content has not changed, Git won't store anything new but the new file name. In fact, in you example, git cat-file -s HEAD:foo before rename and git cat-file -s HEAD:bar after will give you the same result, since its the same content (same blob in .git/objects). I think you are mis-interpreting things that git does internally. Have a look to Git objets to get further explanations.

Remember that git tracks content, not files.




回答2:


Moving things around in git does not change the size of the repository. Each file is stored exactly once in the repository. You will only increase the size of the repository if you start to change those huge files. - Then each new version is stored separately.

Have a look at git-annex, maybe that is the right thing for you.



来源:https://stackoverflow.com/questions/16846024/moving-large-number-of-large-files-in-git-repository

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!