Fast disk-based hashtables?

前端 未结 6 753
伪装坚强ぢ
伪装坚强ぢ 2020-12-04 11:30

I have sets of hashes (first 64 bits of MD5, so they\'re distributed very randomly) and I want to be able to see if a new hash is in a set, and to add it to a set.

S

6条回答
  •  遥遥无期
    2020-12-04 11:55

    I had some trouble picturing your exact problem/need, but it still got me thinking about Git and how it stores SHA1-references on disk:

    Take the hexadecimal string representation of a given hash, say, "abfab0da6f4ebc23cb15e04ff500ed54". Chop the two first characters in the hash ("ab", in our case) and make it into a directory. Then, use the rest ("fab0da6f4ebc23cb15e04ff500ed54"), create the file, and put stuff in it.

    This way, you get pretty decent performance on-disk (depending on your FS, naturally) with an automatic indexing. Additionally, you get direct access to any known hash, just by wedging a directory delimiter after the two first chars ("./ab/fab0da[..]")

    I'm sorry if I missed the ball entirely, but with any luck, this might give you an idea.

提交回复
热议问题