问题
With python 2.7 the following code computes the mD5 hexdigest of the content of a file.
(EDIT: well, not really as answers have shown, I just thought so).
import hashlib
def md5sum(filename):
    f = open(filename, mode='rb')
    d = hashlib.md5()
    for buf in f.read(128):
        d.update(buf)
    return d.hexdigest()
Now if I run that code using python3 it raise a TypeError Exception:
    d.update(buf)
TypeError: object supporting the buffer API required
I figured out that I could make that code run with both python2 and python3 changing it to:
def md5sum(filename):
    f = open(filename, mode='r')
    d = hashlib.md5()
    for buf in f.read(128):
        d.update(buf.encode())
    return d.hexdigest()
Now I still wonder why the original code stopped working. It seems that when opening a file using the binary mode modifier it returns integers instead of strings encoded as bytes (I say that because type(buf) returns int). Is this behavior explained somewhere ?
回答1:
I think you wanted the for-loop to make successive calls to f.read(128).  That can be done using iter() and functools.partial():
import hashlib
from functools import partial
def md5sum(filename):
    with open(filename, mode='rb') as f:
        d = hashlib.md5()
        for buf in iter(partial(f.read, 128), b''):
            d.update(buf)
    return d.hexdigest()
print(md5sum('utils.py'))
回答2:
for buf in f.read(128):
  d.update(buf)
.. updates the hash sequentially with each of the first 128 bytes values of the file. Since iterating over a bytes produces int objects, you get the following calls which cause the error you encountered in Python3.
d.update(97)
d.update(98)
d.update(99)
d.update(100)
which is not what you want.
Instead, you want:
def md5sum(filename):
  with open(filename, mode='rb') as f:
    d = hashlib.md5()
    while True:
      buf = f.read(4096) # 128 is smaller than the typical filesystem block
      if not buf:
        break
      d.update(buf)
    return d.hexdigest()
回答3:
I finally changed my code to the version below (that I find easy to understand) after asking the question. But I will probably change it to the version suggested by Raymond Hetting unsing functools.partial.
import hashlib
def chunks(filename, chunksize):
    f = open(filename, mode='rb')
    buf = "Let's go"
    while len(buf):
        buf = f.read(chunksize)
        yield buf
def md5sum(filename):
    d = hashlib.md5()
    for buf in chunks(filename, 128):
        d.update(buf)
    return d.hexdigest()
来源:https://stackoverflow.com/questions/7829499/using-hashlib-to-compute-md5-digest-of-a-file-in-python-3