What's the best hashing algorithm to use on a stl string when using hash_map?

前端未结

关注

 11  2294

I\'ve found the standard hashing function on VS2005 is painfully slow when trying to achieve high performance look ups. What are some good examples of fast and efficient has

相关标签:

11条回答

感情败类

2020-12-04 11:18
I worked with Paul Larson of Microsoft Research on some hashtable implementations. He investigated a number of string hashing functions on a variety of datasets and found that a simple multiply by 101 and add loop worked surprisingly well.
```
unsigned int
hash(
    const char* s,
    unsigned int seed = 0)
{
    unsigned int hash = seed;
    while (*s)
    {
        hash = hash * 101  +  *s++;
    }
    return hash;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
滥情空心

2020-12-04 11:22

I did a little searching, and funny thing, Paul Larson's little algorithm showed up here http://www.strchr.com/hash_functions as having the least collisions of any tested in a number of conditions, and it's very fast for one that it's unrolled or table driven.

Larson's being the simple multiply by 101 and add loop above.

0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-12-04 11:25
One classic suggestion for a string hash is to step through the letters one by one adding their ascii/unicode values to an accumulator, each time multiplying the accumulator by a prime number. (allowing overflow on the hash value)
```
  template <> struct myhash{};

  template <> struct myhash<string>
    {
    size_t operator()(string &to_hash) const
      {
      const char * in = to_hash.c_str();
      size_t out=0;
      while(NULL != *in)
        {
        out*= 53; //just a prime number
        out+= *in;
        ++in;
        }
      return out;
      }
    };

  hash_map<string, int, myhash<string> > my_hash_map;
```
It's hard to get faster than that without throwing out data. If you know your strings can be differentiated by only a few characters and not their whole content, you can do faster.

You might try caching the hash value better by creating a new subclass of basic_string that remembers its hash value, if the value gets calculated too often. hash_map should be doing that internally, though.
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2020-12-04 11:29

That always depends on your data-set.

I for one had surprisingly good results by using the CRC32 of the string. Works very good with a wide range of different input sets.

Lots of good CRC32 implementations are easy to find on the net.

Edit: Almost forgot: This page has a nice hash-function shootout with performance numbers and test-data:

http://smallcode.weblogs.us/ <-- further down the page.

0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2020-12-04 11:33

Python 3.4 includes a new hash algorithm based on SipHash. PEP 456 is very informative.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-12-04 11:35

I've use the Jenkins hash to write a Bloom filter library, it has great performance.

Details and code are available here: http://burtleburtle.net/bob/c/lookup3.c

This is what Perl uses for its hashing operation, fwiw.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页