I have a problem in my web crawler where I am trying to retrieve images from a particular website. Problem is that often I see images that are exactly same but different in
You could also generate a MD5 signature of the file and ignore duplicate entries. Won't help you find similar images though.