Fast disk-based hashtables?

前端 未结 6 759
伪装坚强ぢ
伪装坚强ぢ 2020-12-04 11:30

I have sets of hashes (first 64 bits of MD5, so they\'re distributed very randomly) and I want to be able to see if a new hash is in a set, and to add it to a set.

S

6条回答
  •  青春惊慌失措
    2020-12-04 12:16

    Two algorithms come to my mind at first:

    • Use a b-tree.
    • Separate-chain the hashes themselves by doing something like using the first 10 bits of your hash to index into one of 1024 individual files, each of which contains a sorted list of all the hashes starting with those 10 bits. That gives you a constant-time jump into a block that ought to fit into memory, and a log(n) search once you've loaded that block. (or you could use 8 bits to hash into 256 files, etc.)

提交回复
热议问题