Max limit of bytes in method update of Hashlib Python module

前端 未结 3 2008
误落风尘
误落风尘 2020-12-21 09:32

I am trying to compute md5 hash of a file with the function hashlib.md5() from hashlib module.

So that I writed this piece of code:



        
3条回答
  •  太阳男子
    2020-12-21 09:40

    Big (≈2**40) chunk sizes lead to MemoryError i.e., there is no limit other than available RAM. On the other hand bufsize is limited by 2**31-1 on my machine:

    import hashlib
    from functools import partial
    
    def md5(filename, chunksize=2**15, bufsize=-1):
        m = hashlib.md5()
        with open(filename, 'rb', bufsize) as f:
            for chunk in iter(partial(f.read, chunksize), b''):
                m.update(chunk)
        return m
    

    Big chunksize can be as slow as a very small one. Measure it.

    I find that for ≈10MB files the 2**15 chunksize is the fastest for the files I've tested.

提交回复
热议问题