Confusion in hashing used by LSH

后端 未结 1 1167
难免孤独
难免孤独 2020-12-12 04:23

Matrix M is the signatures matrix, which is produced via Minhashing of the actual data, has documents as columns and words as rows. So a column represe

相关标签:
1条回答
  • 2020-12-12 04:40

    I think I figured it out, posting for future readers.

    I am going to use one dictionary, since the slides mentioned that it's OK to use the same hash function for every stripe (dictionaries do that).

    Every bucket will be a key for our dictionary.

    On insertion, a document (i.e. a column which belongs in a stripe) will be passed by a hash function (which we will create) and the result should be a key. That way our dictionary will be populated.

    0 讨论(0)
提交回复
热议问题