Why does git hash-object return a different hash than openssl sha1?

前端 未结 4 413
广开言路
广开言路 2020-12-05 12:42

Context: I downloaded a file (Audirvana 0.7.1.zip) from code.google to my Macbook Pro (Mac OS X 10.6.6).

I wanted to verify the checksum, which for that particular f

相关标签:
4条回答
  • 2020-12-05 13:08

    You see a difference because git hash-object doesn't just take a hash of the bytes in the file - it prepends the string "blob " followed by the file size and a NUL to the file's contents before hashing. There are more details in this other answer on Stack Overflow:

    • How to assign a Git SHA1's to a file without Git?

    Or, to convince yourself, try something like:

    $ echo -n hello | git hash-object --stdin
    b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0
    
    $ printf 'blob 5\0hello' > test.txt
    $ openssl sha1 test.txt
    SHA1(test.txt)= b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0
    
    0 讨论(0)
  • 2020-12-05 13:13

    Git stores objects as [Object Type, Object Length, delimeter (\0), Content] In your case:

    $ echo "A" | git hash-object --stdin
    f70f10e4db19068f79bc43844b49f3eece45c4e8
    

    Try to calculate hash as:

    $ echo -e "blob 2\0A" | shasum 
    f70f10e4db19068f79bc43844b49f3eece45c4e8  -
    

    Note using -e (for bash shell) and adjusting length for newline.

    0 讨论(0)
  • 2020-12-05 13:21

    The answer lies here:

    How to assign a Git SHA1's to a file without Git?

    git calculates on file metadata + contents, not just contents.

    That is a good enough answer for now, and the takeaway is that git is not the tool for checksumming downloads.

    0 讨论(0)
  • The SHA1 digest is calculated over a header string followed by the file data. The header consists of the object type, a space and the object length in bytes as decimal. This is separated from the data by a null byte.

    So:

    $ git hash-object foo.txt
    f70f10e4db19068f79bc43844b49f3eece45c4e8
    $ ( perl -e '$size = (-s shift); print "blob $size\x00"' foo.txt \
                   && cat foo.txt ) | openssl sha1
    f70f10e4db19068f79bc43844b49f3eece45c4e8
    

    One consequence of this is that "the" empty tree and "the" empty blob have different IDs. That is:

    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 always means "empty file" 4b825dc642cb6eb9a060e54bf8d69288fbee4904 always means "empty directory"

    You will find that you can in fact do git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 in a new git repository with no objects registered, because it is recognised as a special case and never actually stored (with modern Git versions). By contrast, if you add an empty file to your repo, a blob "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" will be stored.

    0 讨论(0)
提交回复
热议问题