Why does git hash-object return a different hash than openssl sha1?

风格不统一 提交于 2019-11-27 20:22:25
Mark Longair

You see a difference because git hash-object doesn't just take a hash of the bytes in the file - it prepends the string "blob " followed by the file size and a NUL to the file's contents before hashing. There are more details in this other answer on Stack Overflow:

Or, to convince yourself, try something like:

$ echo -n hello | git hash-object --stdin
b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0

$ printf 'blob 5\0hello' > test.txt
$ openssl sha1 test.txt
SHA1(test.txt)= b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0

The SHA1 digest is calculated over a header string followed by the file data. The header consists of the object type, a space and the object length in bytes as decimal. This is separated from the data by a null byte.

So:

$ git hash-object foo.txt
f70f10e4db19068f79bc43844b49f3eece45c4e8
$ ( perl -e '$size = (-s shift); print "blob $size\x00"' foo.txt \
               && cat foo.txt ) | openssl sha1
f70f10e4db19068f79bc43844b49f3eece45c4e8

One consequence of this is that "the" empty tree and "the" empty blob have different IDs. That is:

e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 always means "empty file" 4b825dc642cb6eb9a060e54bf8d69288fbee4904 always means "empty directory"

You will find that you can in fact do git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 in a new git repository with no objects registered, because it is recognised as a special case and never actually stored (with modern Git versions). By contrast, if you add an empty file to your repo, a blob "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" will be stored.

twcamper

The answer lies here:

How to assign a Git SHA1's to a file without Git?

git calculates on file metadata + contents, not just contents.

That is a good enough answer for now, and the takeaway is that git is not the tool for checksumming downloads.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!