What's the best hashing algorithm to use on a stl string when using hash_map?

前端未结

关注

 11  2315

I\'ve found the standard hashing function on VS2005 is painfully slow when trying to achieve high performance look ups. What are some good examples of fast and efficient has

相关标签:

11条回答

孤城傲影

2020-12-04 11:35

If your strings are on average longer than a single cache line, but their length+prefix are reasonably unique, consider hasing just the length+first 8/16 characters. (The length is contained in the std::string object itself and therefore cheap to read)

0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2020-12-04 11:40
From Hash Functions all the way down:

MurmurHash got quite popular, at least in game developer circles, as a “general hash function”.

It’s a fine choice, but let’s see later if we can generally do better. Another fine choice, especially if you know more about your data than “it’s gonna be an unknown number of bytes”, is to roll your own (e.g. see Won Chun’s replies, or Rune’s modified xxHash/Murmur that are specialized for 4-byte keys etc.). If you know your data, always try to see whether that knowledge can be used for good effect!

Without more information I would recommend MurmurHash as a general purpose non-cryptographic hash function. For small strings (of the size of the average identifier in programs) the very simple and famous djb2 and FNV are very good.

Here (data sizes < 10 bytes) we can see that the ILP smartness of other algorithms does not get to show itself, and the super-simplicity of FNV or djb2 win in performance.

djb2
```
unsigned long
hash(unsigned char *str)
{
    unsigned long hash = 5381;
    int c;

    while (c = *str++)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;
}
```
FNV-1
```
hash = FNV_offset_basis
for each byte_of_data to be hashed
     hash = hash × FNV_prime
     hash = hash XOR byte_of_data
return hash
```
FNV-1A
```
hash = FNV_offset_basis
for each byte_of_data to be hashed
     hash = hash XOR byte_of_data
     hash = hash × FNV_prime
return hash
```
A note about security and availability

Hash functions can make your code vulnerable to denial-of-service attacks. If an attacker is able to force your server to handle too many collisions, your server may not be able to cope with requests.

Some hash functions like MurmurHash accept a seed that you can provide to drastically reduce the ability of attackers to predict the hashes your server software is generating. Keep that in mind.
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-12-04 11:41

Boost has an boost::hash library which can provides some basic hash functions for most common types.

0 讨论(0)
发布评论:

提交评论
- 加载中...

忘掉有多难

2020-12-04 11:43

From some old code of mine:

/* magic numbers from http://www.isthe.com/chongo/tech/comp/fnv/ */
static const size_t InitialFNV = 2166136261U;
static const size_t FNVMultiple = 16777619;

/* Fowler / Noll / Vo (FNV) Hash */
size_t myhash(const string &s)
{
    size_t hash = InitialFNV;
    for(size_t i = 0; i < s.length(); i++)
    {
        hash = hash ^ (s[i]);       /* xor  the low 8 bits */
        hash = hash * FNVMultiple;  /* multiply by the magic number */
    }
    return hash;
}

Its fast. Really freaking fast.

0 讨论(0)

孤独总比滥情好

2020-12-04 11:44

If you are hashing a fixed set of words, the best hash function is often a perfect hash function. However, they generally require that the set of words you are trying to hash is known at compile time. Detection of keywords in a lexer (and translation of keywords to tokens) is a common usage of perfect hash functions generated with tools such as gperf. A perfect hash also lets you replace hash_map with a simple array or vector.

If you're not hashing a fixed set of words, then obviously this doesn't apply.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

What's the best hashing algorithm to use on a stl string when using hash_map?

djb2

FNV-1

FNV-1A

A note about security and availability