Compute hash of only the core image data (excluding metadata) for an image

后端 未结 4 2066
花落未央
花落未央 2020-12-13 20:31

I\'m writing a script to calculate the MD5 sum of an image excluding the EXIF tag.

In order to do this accurately, I need to know where the EXIF tag is located in th

4条回答
  •  感动是毒
    2020-12-13 20:42

    It is much easier to use the Python Imaging Library to extract the picture data (example in iPython):

    In [1]: import Image
    
    In [2]: import hashlib
    
    In [3]: im = Image.open('foo.jpg')
    
    In [4]: hashlib.md5(im.tobytes()).hexdigest()
    Out[4]: '171e2774b2549bbe0e18ed6dcafd04d5'
    

    This works on any type of image that PIL can handle. The tobytes method returns the a string containing the pixel data.

    BTW, the MD5 hash is now seen as pretty weak. Better to use SHA512:

    In [6]: hashlib.sha512(im.tobytes()).hexdigest()
    Out[6]: '6361f4a2722f221b277f81af508c9c1d0385d293a12958e2c56a57edf03da16f4e5b715582feef3db31200db67146a4b52ec3a8c445decfc2759975a98969c34'
    

    On my machine, calculating the MD5 checksum for a 2500x1600 JPEG takes around 0.07 seconds. Using SHA512, it takes 0,10 seconds. Complete example:

    #!/usr/bin/env python3
    
    from PIL import Image
    import hashlib
    import sys
    
    im = Image.open(sys.argv[1])
    print(hashlib.sha512(im.tobytes()).hexdigest(), end="")
    

    For movies, you can extract frames from them with e.g. ffmpeg, and then process them as shown above.

提交回复
热议问题