What's a good hash function for English words?

后端未结

关注

 4  673

情书的邮戳 2020-11-27 19:17

I have a long list of English words and I would like to hash them. What would be a good hashing function? So far my hashing function sums the ASCII values of the letters the

4条回答

温柔的废话 (楼主)

2020-11-27 19:58
A bit late, but here is a hashing function with an extremely low collision rate for 64-bit version below, and ~almost~ as good for the 32-bit version:
```
uint64_t slash_hash(const char *s)
//uint32_t slash_hash(const char *s)
{
    union { uint64_t h; uint8_t u[8]; } uu;
    int i=0; uu.h=strlen(s);
    while (*s) { uu.u[i%8] += *s + i + (*s >> ((uu.h/(i+1)) % 5)); s++; i++; }
    return uu.h; //64-bit
    //return (uu.h+(uu.h>>32)); //32-bit
}
```
The hash-numbers are also very evenly spread across the possible range, with no clumping that I could detect - this was checked using the random strings only.
[edit]
Also tested against words extracted from local text-files combined with LibreOffice dictionary/thesaurus words (English and French - more than 97000 words and constructs) with 0 collisions in 64-bit and 1 collision in 32-bit :)

(Also compared with FNV1A_Hash_Yorikke, djb2 and MurmurHash2 on same sets: Yorikke & djb2 did not do well; slash_hash did slightly better than MurmurHash2 in all the tests)
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...