What's a good hash function for English words?

后端 未结 4 679
情书的邮戳
情书的邮戳 2020-11-27 19:17

I have a long list of English words and I would like to hash them. What would be a good hashing function? So far my hashing function sums the ASCII values of the letters the

4条回答
  •  长情又很酷
    2020-11-27 19:59

    To simply sum the letters is not a good strategy because a permutation gives the same result.

    This one (djb2) is quite popular and works nicely with ASCII strings.

    unsigned long hashstring(unsigned char *str)
    {
        unsigned long hash = 5381;
        int c;
    
        while (c = *str++)
            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    
        return hash;
    }
    

    More info here.

    If you need more alternatives and some perfomance measures, read here.

    Added: These are general hashing functions, where the input domain is not known in advance (except perhaps some very general assumptions: eg the above works slightly better with ascii input), which is the most usual scenario. If you have a known restricted domain (set of inputs fixed) you can do better, see Fionn's answer.

提交回复
热议问题