Difference between `block_size` and `digest_size` in hashlib?

问题

I was going through the Python hashlib package documentation and wanted some clarification on two hash object attributes (namely hash.block_size and hash.digest_size). Here is the definition of each attribute:

hash.digest_size= "The size of the resulting hash in bytes."
hash.block_size = "The internal block size of the hash algorithm in bytes."
source: https://docs.python.org/2/library/hashlib.html

So I understand that hash.digest_size is simply the length or size (in bytes) of the data once it is hashed or "digested" by the hash_object. For for example from the code below getting the digest of the string 'Hello World' via a SHA256 hash object returns a digest_size of 32 bytes (or 256 bits).

import hashlib
hash_object = hashlib.sha256()
hash_object.update(b'Hello World')
hex_dig = hash_object.hexdigest()

print(hex_dig)
>>>a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
print(hash_object.digest_size)
>>>32
print(hash_object.block_size)
>>>64 
print(len(hex_dig))
>>>64

What I don't understand is this hash.block_size attribute. Is it simply the length of characters required to represent the hexadecimal representation of the hashed data? Is it something else entirely? I don't quite understand the definition of this attribute so any clarification on this would be very helpful & insightful!

回答1:

The hash is computed with input arbitrary length data. Most hash functions do this by using a function that updates an internal state based on a fixed block of data, and the file (e.g.) you're hashing is processed in chunks of this fixed block size.

So most hash functions have a fixed initial state (often of digest_size, but sometimes larger) that gets initialised in an initialise function (or the constructor of an hash with empty input, as well). For SHA-256 (and SHA-224 as well) this is 32 bytes, or actually 8 integers.

Then it processes input data in chunks (for SHA-256 this is 64 bytes, which are transformed into 16 32-bit integers, then a longish computation is done on 8 state integers and 16 data integers after which we have a new state of 8 integers. This goes on for as long as there is input data. The size of the input chunk is the block_size.

When we want to compute a digest (typically at the end of the data) we pad out the last data (if the last input chunk was smaller than the block size) and also put the total hashed length so far into the final 64 bytes (16 integers) and do the transformation from before one final time. The digest function then outputs the (part of) the final state (SHA-224 only outputs 224 bits=28 bytes of its 32 byte state) as the digest. The size of the final output (in bytes) is digest_size.

来源：https://stackoverflow.com/questions/51253262/difference-between-block-size-and-digest-size-in-hashlib

标签

python

python-3.x

hashlib