Is there a hashing algorithm that is tolerant of minor differences?
问题 I'm doing some web crawling type stuff where I'm looking for certain terms in webpages and finding their location on the page, and then caching it for later use. I'd like to be able to check the page periodically for any major changes. Something like md5 can be foiled by simply putting the current date and time on the page. Are there any hashing algorithms that work for something like this? 回答1: A common way to do document similarity is shingling, which is somewhat more involved than hashing.