What is the best 32bit hash function for short strings (tag names)?

前端 未结 8 1442
傲寒
傲寒 2020-12-12 16:43

What is the best 32bit hash function for relatively short strings?

Strings are tag names that consist of English letters, numbers, spaces and some additional charact

相关标签:
8条回答
  • 2020-12-12 16:58

    That depends on your hardware. On modern hardware, i.e. Intel/AMD with SSE4.2 or arm7 you should use the internal _mm_crc32_uxx intrinsics, as they are optimal for short strings. (For long keys also, but then better use Adler's threaded version, as in zlib)

    On old or unknown hardware, either run-time probe for the SSE4.2 or CRC32 feature or just use one if the simple good hash functions. E.g. Murmur2 or City

    An overview of quality and performance is here: https://github.com/rurban/smhasher#smhasher

    There are also all the implementations. Favored are https://github.com/rurban/smhasher/blob/master/crc32_hw.c and https://github.com/rurban/smhasher/blob/master/MurmurHash2.cpp

    If you know the keys in advance, use a perfect hash, not a hash function. E.g. gperf or my phash: https://github.com/rurban/Perfect-Hash#name

    Nowadays perfect hash generation via a c compiler is so fast, you can even create them on the fly, and dynaload it.

    0 讨论(0)
  • 2020-12-12 17:09

    I'm not sure if it's the best choice, but here is a hash function for strings:

    The Practice of Programming (HASH TABLES, pg. 57)

    /* hash: compute hash value of string */
    unsigned int hash(char *str)
    {
       unsigned int h;
       unsigned char *p;
    
       h = 0;
       for (p = (unsigned char*)str; *p != '\0'; p++)
          h = MULTIPLIER * h + *p;
       return h; // or, h % ARRAY_SIZE;
    }
    

    Empirically, the values 31 and 37 have proven to be good choices for the multiplier in a hash function for ASCII strings.

    0 讨论(0)
提交回复
热议问题