HashSet of Strings taking up too much memory, suggestions…?

前端 未结 8 1854
闹比i
闹比i 2020-12-29 08:44

I am currently storing a list of words (around 120,000) in a HashSet, for the purpose of using as a list to check enetered words against to see if they are spelt correctly,

8条回答
  •  死守一世寂寞
    2020-12-29 09:35

    This might be a bit late but using Google you can easily find the DAWG investigation and C code that I posted a while ago.

    http://www.pathcom.com/~vadco/dawg.html

    TWL06 - 178,691 words - fits into 494,676 Bytes

    The downside of a compressed-shared-node structure is that it does not work as a hash function for the words in your list. That is to say, it will tell you if a word exists, but it will not return an index to related data for a word that does exist.

    If you want the perfect and complete hash functionality, in a processor-cache sized structure, you are going to have to read, understand, and modify a data structure called the ADTDAWG. It will be slightly larger than a traditional DAWG, but it is faster and more useful.

    http://www.pathcom.com/~vadco/adtdawg.html

    All the very best,

    JohnPaul Adamovsky

提交回复
热议问题