I have a problem in my web crawler where I am trying to retrieve images from a particular website. Problem is that often I see images that are exactly same but different in
Hashing is already suggested and recognizing if two files are identical is very easy, but you said pixel level. If you want to recognize two images even if they are in different formats (.png/.jpg/.gif/..) and even if they were scaled I suggest: (using an image library and if the image are medium/big no 16x16 icons):
You will do a sum of the difference of all the grey pixels of both images you get a number if the difference is < T you consider both images identical
--