Need memory efficient way to store tons of strings (was: HAT-Trie implementation in java)

前端 未结 4 2119
野的像风
野的像风 2020-12-24 02:15

I am working with a large set (5-20 million) of String keys (average length 10 chars) which I need to store in an in memory data structure that supports th

4条回答
  •  执笔经年
    2020-12-24 02:30

    The trie seems like a very good idea for your constraints.

    A "thinking outside the box" alternative:

    If you can afford some probability of answering "present" for a string that is absent

    EDIT: if you can afford false positives, use a Bloom filter as suggested by WizardOfOdds in the comments.

    For k=1, a Bloom filter is like a hash table without the keys: each "bucket" is simply a boolean that tells if at least one input with the same hash was present. If 1% false positives is acceptable, your hash table can be as small as about 100 * 20 million bits or roughly 200 MiB. For 1 in 1000 false positives, 2GiB.

    Using several hash functions instead of one can improve the false positive rate for the same amount of bits.

提交回复
热议问题