What's the best hashing algorithm to use on a stl string when using hash_map?

前端 未结 11 2332
生来不讨喜
生来不讨喜 2020-12-04 10:41

I\'ve found the standard hashing function on VS2005 is painfully slow when trying to achieve high performance look ups. What are some good examples of fast and efficient has

11条回答
  •  囚心锁ツ
    2020-12-04 11:40

    From Hash Functions all the way down:

    MurmurHash got quite popular, at least in game developer circles, as a “general hash function”.

    It’s a fine choice, but let’s see later if we can generally do better. Another fine choice, especially if you know more about your data than “it’s gonna be an unknown number of bytes”, is to roll your own (e.g. see Won Chun’s replies, or Rune’s modified xxHash/Murmur that are specialized for 4-byte keys etc.). If you know your data, always try to see whether that knowledge can be used for good effect!

    Without more information I would recommend MurmurHash as a general purpose non-cryptographic hash function. For small strings (of the size of the average identifier in programs) the very simple and famous djb2 and FNV are very good.

    Here (data sizes < 10 bytes) we can see that the ILP smartness of other algorithms does not get to show itself, and the super-simplicity of FNV or djb2 win in performance.

    djb2

    unsigned long
    hash(unsigned char *str)
    {
        unsigned long hash = 5381;
        int c;
    
        while (c = *str++)
            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    
        return hash;
    }
    

    FNV-1

    hash = FNV_offset_basis
    for each byte_of_data to be hashed
         hash = hash × FNV_prime
         hash = hash XOR byte_of_data
    return hash
    

    FNV-1A

    hash = FNV_offset_basis
    for each byte_of_data to be hashed
         hash = hash XOR byte_of_data
         hash = hash × FNV_prime
    return hash
    

    A note about security and availability

    Hash functions can make your code vulnerable to denial-of-service attacks. If an attacker is able to force your server to handle too many collisions, your server may not be able to cope with requests.

    Some hash functions like MurmurHash accept a seed that you can provide to drastically reduce the ability of attackers to predict the hashes your server software is generating. Keep that in mind.

提交回复
热议问题