问题
Does this method compare the pixel values of the images? I'm guessing it won't work because they are different sizes from each other but what if they are identical, but in different formats? For example, I took a screenshot and saved as a .jpg
and another and saved as a .gif
.
回答1:
An MD5 hash is of the actual binary data, so different formats will have completely different binary data.
so for MD5 hashes to match, they must be identical files. (There are exceptions in fringe cases.)
This is actually one way forensic law enforcement finds data it deems as contraband. (in reference to images)
回答2:
It is an MD5 Checksum - the same thing you often see when downloading a file, if the MD5 of the downloaded file matches the MD5 given by the provider, then the file transfer was successful. http://en.wikipedia.org/wiki/Checksum If there is even 1 bit of difference between the 2 files then the resulting hash will be completely different.
Due to the difference in encoding between a JPG and GIF, the 2 will not have the same MD5 hash.
回答3:
A .jpg file starts with 'JFIF', a .gif starts with 'GIF' when you look at the raw bytes. In otherwords, comparing the on-disk bytes of the "same image" in two different format is pretty much guaranteed to produce two different MD5 hashes, since the file's contents differ - even if the actual image is the "same picture".
To do a hash-based image comparison, you have to compare two images using the same format. It would be very very difficult to produce a .jpg and a .gif of the same image that would compare equal if you converted them to (say) a .bmp. It'd be the same fileformat, but the internal requirements of .gif (8bit, RLE/LZW lossless compression) v.s. the internal requirements of .jpg (24bit, lossy discrete cosine transform compression) mean it's nigh-on impossible to get the same .bmp from both source images.
回答4:
If you're comparing hashes then every single byte of the two images will have to match - they can't use different compression formats, or "look the same". They have to be identical.
回答5:
You cannot compare using the MD5 sum, as all the other posters have noted. However, you can compare the images in a different way, and it will tell you their similarity regardless of image type, or even size. You can use libPuzzle
http://libpuzzle.pureftpd.org/project/libpuzzle
This is a great library for image comparison and works very well.
回答6:
md5
is a hash algorithm, so it does not compare images but it compares data. The data you put in can be nearly anything, like the contents of a file. It then outputs a hashstring based on the contents, which is the raw data of the file.
So you basically do not compare images when feeding the image into md5
but the raw data of the image. The hash algorithm does not know anything about it but the raw data, so a jpg and an gif (or any other image format) of the same screenshot will never be the same.
Even if you compare the decoded image it will not put out the same hash but will have small differences the human eye cannot see (depending on the amount of compression used). This might be different when comparing the decoded data of lossless encoded images, but I don't know here.
Take a look at the wikipedia article for a more detailed explanation and technical background about hash functions.
回答7:
md5 is a hash. It is a code that is calculated from a bunch of data - any data really.
md5 is certainly not unique, but the chance that two different images have the exact same code is quite small. Therefor you could compare images by calculating an md5 code from each of them and compare the codes.
回答8:
It will still not work. Any image contains the header portion and the binary image buffer. In the said scenario 1. The the headers will be different between .jpg & .gif resulting in a different md5 sum 2. The image buffer itself may be different due to image compression as used by say the .jpg format.
来源:https://stackoverflow.com/questions/4853185/how-does-comparing-images-through-md5-work