Hash algorithm for dynamic growing/streaming data?

后端 未结 3 584
南笙
南笙 2020-12-10 16:38

Are there any algorithms that you can continue hashing from a known hash digest? For example, the client upload a chunk of file to ServerA, I can get a

3条回答
  •  不知归路
    2020-12-10 17:02

    Not from the known digest, but from the known state. You can use a pure python MD5 implementation and save its state. Here is an example using _md5.py from from PyPy:

    import _md5
    
    def md5_getstate(md):
        return (md.A, md.B, md.C, md.D, md.count + [], md.input + [], md.length)
    
    def md5_continue(state):
        md = _md5.new()
        (md.A, md.B, md.C, md.D, md.count, md.input, md.length) = state
        return md
    
    m1 = _md5.new()
    m1.update("hello, ")
    state = md5_getstate(m1)
    m2 = md5_continue(state)
    m2.update("world!")
    print m2.hexdigest()
    
    m = _md5.new()
    m.update("hello, world!")
    print m.hexdigest()
    

    As e.dan noted, you can also use almost any checksuming algorithm (CRC, Adler, Fletcher), but they do not protect you well from the intentional data modification, only from the random errors.

    EDIT: of course, you can also re-implement the serialization method using ctypes from the thread you referenced in a more portable way (without magic constants). I believe this should be version/architecture independent (tested on python 2.4-2.7, both i386 and x86_64):

    # based on idea from http://groups.google.com/group/comp.lang.python/msg/b1c5bb87a3ff5e34
    
    try:
        import _md5 as md5
    except ImportError:
        # python 2.4
        import md5
    import ctypes
    
    def md5_getstate(md):
        if type(md) is not md5.MD5Type:
            raise TypeError, 'not an MD5Type instance'
        return ctypes.string_at(id(md) + object.__basicsize__,
                                md5.MD5Type.__basicsize__ - object.__basicsize__)
    
    def md5_continue(state):
        md = md5.new()
        assert len(state) == md5.MD5Type.__basicsize__ - object.__basicsize__, \
               'invalid state'    
        ctypes.memmove(id(md) + object.__basicsize__,
                       ctypes.c_char_p(state),
                       len(state))
        return md
    
    m1 = md5.new()
    m1.update("hello, ")
    state = md5_getstate(m1)
    m2 = md5_continue(state)
    m2.update("world!")
    print m2.hexdigest()
    
    m = md5.new()
    m.update("hello, world!")
    print m.hexdigest()
    

    It is not Python 3 compatible, since it does not have an _md5/md5 module.

    Unfortunately hashlib's openssl_md5 implementation is not suitable for such hacks, since OpenSSL EVP API does not provide any calls/methods to reliably serialize EVP_MD_CTX objects.

提交回复
热议问题