Compute hash of only the core image data (excluding metadata) for an image

后端 未结 4 2067
花落未央
花落未央 2020-12-13 20:31

I\'m writing a script to calculate the MD5 sum of an image excluding the EXIF tag.

In order to do this accurately, I need to know where the EXIF tag is located in th

4条回答
  •  旧时难觅i
    2020-12-13 20:41

    One simple way to do it is to hash the core image data. For PNG, you could do this by counting only the "critical chunks" (i.e. the ones starting with capital letters). JPEG has a similar but simpler file structure.

    The visual hash in ImageMagick decompresses the image as it hashes it. In your case, you could hash the compressed image data right away, so (if implemented correctly) a it should be just as quick as hashing the raw file.

    This is a small Python script illustrating the idea. It may or may not work for you, but it should at least give an indication to what I mean :)

    import struct
    import os
    import hashlib
    
    def png(fh):
        hash = hashlib.md5()
        assert fh.read(8)[1:4] == "PNG"
        while True:
            try:
                length, = struct.unpack(">i",fh.read(4))
            except struct.error:
                break
            if fh.read(4) == "IDAT":
                hash.update(fh.read(length))
                fh.read(4) # CRC
            else:
                fh.seek(length+4,os.SEEK_CUR)
        print "Hash: %r" % hash.digest()
    
    def jpeg(fh):
        hash = hashlib.md5()
        assert fh.read(2) == "\xff\xd8"
        while True:
            marker,length = struct.unpack(">2H", fh.read(4))
            assert marker & 0xff00 == 0xff00
            if marker == 0xFFDA: # Start of stream
                hash.update(fh.read())
                break
            else:
                fh.seek(length-2, os.SEEK_CUR)
        print "Hash: %r" % hash.digest()
    
    
    if __name__ == '__main__':
        png(file("sample.png"))
        jpeg(file("sample.jpg"))
    

提交回复
热议问题