Why does Git use the SHA1 of the *compressed* objects rather than the SHA1 of the original objects?

送分小仙女□ 提交于 2019-12-23 07:14:09

问题


I'm just curious as to why this choice was made - it basically rules out changing the compression algorithm used by Git - because it doesn't use the SHA1 of the raw blobs. Perhaps there is some efficiency consideration here. Maybe ZLIB is faster at compressing a file than the SHA1 algorithm is at creating the hash, so therefore compressing before hashing is faster?

Here is a link to the original Git READMEby Linus: http://git.kernel.org/?p=git/git.git;a=blob;f=README;h=27577f76849c09d3405397244eb3d8ae1d11b0f3;hb=e83c5163316f89bfbde7d9ab23ca2e25604af290

And here is the relavent paragraph:

"There are several kinds of objects in the content-addressable collection database. They are all in deflated with zlib, and start off with a tag of their type, and size information about the data. The SHA1 hash is always the hash of the compressed object, not the original one."


回答1:


Like you said, it is the original README, when Git was started. Since then, it has been changed so that the SHA1 is computed before compressing.

It’s worth noting that the SHA-1 hash that is used to name the object is the hash of the original data plus this header, so 'sha1sum' file does not match the object name for file. (Historical note: in the dawn of the age of git the hash was the SHA-1 of the compressed object.)

http://schacon.github.com/git/user-manual.html#object-details



来源:https://stackoverflow.com/questions/8276149/why-does-git-use-the-sha1-of-the-compressed-objects-rather-than-the-sha1-of-th

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!