Locality Sensitive Hash Implementation? [closed]

巧了我就是萌 提交于 2020-01-20 14:23:05

问题


Are there any relatively simple to understand (and simple to implement) locality-sensitive hash examples in C/C++/Java/C#?

I'd like to learn more about the concept and so want to try an implementation on a few text files just to see how it works, so I don't need anything high-performance or anything... just an example of a hash function that returns similar hashes for similar inputs. I can learn more from it by example afterwards. :)


回答1:


For strings you can use approximate matching algorithm.

  • Generate a random string
  • For all the strings compute their distance from that random shared string using an algorithm like http://www.dotnetperls.com/levenshtein

If the strings are equidistant from a reference string then chances are that they are similar to each other. And there you go you have a locality senitive hash implementation for strings.

You can create different hash buckets for a range of distances.

EDIT: You can try other variations of string distance. A simpler algorithm would just return no. of common characters between two strings.




回答2:


Well there is an excellent into article at MSDN blogs here: http://blogs.msdn.com/b/spt/archive/2008/06/11/locality-sensitive-hashing-lsh-and-min-hash.aspx

Also there is at least once C++ library which you can inspect the source code of here: http://sourceforge.net/projects/lshkit/




回答3:


There is also a Java Implementation on Hadoop. it does a good job on documents.

it's called LikeLike

Currently Likelike supports only Min-Wise independent permutations. Min-Wise independent permutations is applied to the recommendation of Google News




回答4:


I realise you explicitly asked for C/C++/C#, but there is a Python port of the nilsimsa hash which might be easier to grok than other, larger libraries.



来源:https://stackoverflow.com/questions/5769949/locality-sensitive-hash-implementation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!