I am trying to compute md5 hash of a file with the function hashlib.md5() from hashlib module.
So that I writed this piece of code:
Big (≈2**40) chunk sizes lead to MemoryError i.e., there is no limit other than available RAM. On the other hand bufsize is limited by 2**31-1 on my machine:
import hashlib
from functools import partial
def md5(filename, chunksize=2**15, bufsize=-1):
m = hashlib.md5()
with open(filename, 'rb', bufsize) as f:
for chunk in iter(partial(f.read, chunksize), b''):
m.update(chunk)
return m
Big chunksize can be as slow as a very small one. Measure it.
I find that for ≈10MB files the 2**15 chunksize is the fastest for the files I've tested.