Generating Random Hash Functions for LSH Minhash Algorithm

后端 未结 2 1672
夕颜
夕颜 2020-12-16 08:33

I\'m programming a minhashing algorithm in Java that requires me to generate an arbitrary number of random hash functions (240 hash functions in my case), and run any number

相关标签:
2条回答
  • 2020-12-16 08:41

    So the method that I described above was almost correct. The numbers a and b should be randomly generated. However, c needs to be a prime number that is slightly larger than the maximum possible value of x. Once those numbers have been chosen, finding hash value h using h = ((a*x)+b) % c is the standard, accepted way to generate hash functions.

    Also, a and b should be random numbers from the range 1 to c-1.

    0 讨论(0)
  • 2020-12-16 09:01

    When I was working with Bloom filters a few years ago, I ran across an article that describes how to generate multiple hash functions very simply, with a minimum of code. The method he describes works very well. See Less Hashing, Same Performance: Building a Better Bloom Filter.

    The basic idea is to create two hash functions, call them h1 and h2, with which you can then simulate multiple hash functions, g1 through gk, using the formula:

    gi = h1(x) + i*h2(x)
    

    i varies from 1 to k (the number of hash functions you want).

    The paper is well worth reading, even if you decide not to implement his idea. Although after reading it I can't imagine not wanting to implement it. It made my Bloom filter code a whole lot more tractable and didn't negatively impact performance.

    0 讨论(0)
提交回复
热议问题