Do version control systems use diffs to store binary files?

允我心安 提交于 2021-02-20 18:47:17

问题


How do popular version control systems (svn, git) handle storing revisions to a binary document? I have projects with binary sources that are periodically updated and need to be checked in (mostly Photoshop documents, custom data format and a few word processing documents). I've always been worried about checking in the binaries because I thought that the VCS might take a simple route of simply uploading a new copy of the binary each time - and hence my repository would get huge quickly.

If I have several data blocks (let's call them A, B, C, D, etc) and I have a binary file that on first check in looks like ABC, but then on the second check in has been modified to ADBE, will my VCS be smart enough to only store the changed bits or will it create an entirely new image of the file?


回答1:


tl;dr

Git can store just diffs of binary files, but it's not very efficient, so you probably should use some external tools like lfs.

Slightly longer explanation

By default, git doesn't store diffs between commits. When you change some file and make a new commit, git stores object with a content of the whole file. It doesn't matter if you change just one line, or rewrite whole file - git doesn't store diffs, at least at first place. There is a piece of git called git-gc (garbage collector) responsible for tasks such removing dangling commits and optimization, it runs another git command - git-repack which does exactly what you ask for. It takes the whole bunch of objects and stores them inside one pack using delta compression.

Unfortunately packing with git-repack is not especially efficient when comes to compressing binary files. You can always tweak it, but if your files change a lot, or if they are really big, you should probably use some external tool like lfs.




回答2:


We use CollabNet SubVersion Edge.

I just had occasion to commit a 50 megabyte Photoshop .psd file within which I had updated Smart Filter parameters.

09/18/2016  05:15 PM        53,015,186 StarSpikesPro4RealismTest.psd

My SVN repository size grew from:

 Total Files Listed:
       19157 File(s) 26,148,088,902 bytes

to

 Total Files Listed:
       19159 File(s) 26,152,019,035 bytes

That's less than 10% of the size of the .psd file, so quite clearly the entire 50 megabyte file wasn't stored, but a delta was calculated.

Keep in mind that some files, e.g., Photoshop images, may be themselves compressed by their associated application, so the binary contents of the stored file may be entirely different from edit to edit, and thus won't yield good delta performance on any system. But you could choose to disable that compression in Photoshop. This one was actually compressed on save, but even with such compression enabled we saw only a small growth in the repository size.

In my experience in general, an SVN repository used primarily for code development and storage of some associated binary files doesn't seem to grow quickly at all. It's hard to compare specifics, but the above repository, 8 years old and worked on actively by 2 people full-time, containing Visual Studio solutions and mix of downloaded libraries, non-source-code development files such as graphics, build results, documentation, etc., has only grown to 26 gigabytes. The server has a RAID 5 array of three 120 GB SSDs and I don't anticipate it needing an upgrade for years.

-Noel




回答3:


How do popular version control systems (svn, git) handle storing revisions to a binary document?

Rather smart, some are just smarter (but all store changes, not full new version of artifacts)

In my dirty fast tests some time ago (at the time of Git 1.7.*) for the same test-case (same changes in MBs of binaries) the same sequence produced slightly less (a few percents) SVN-repo, compared to Git.

But, on the other hand:

Git-LFS or Mercurial+LargeFiles Extension allow to store binaries (mostly LARGE) outside repository (repo have only pointers to objects in external location) and have the best from both worlds: fast small repo and versioning binaries



来源:https://stackoverflow.com/questions/39522863/do-version-control-systems-use-diffs-to-store-binary-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!