Best hashing algorithm in terms of hash collisions and performance for strings

前端 未结 9 1154
忘掉有多难
忘掉有多难 2020-11-28 03:37

What would be the best hashing algorithm if we had the following priorities (in that order):

  1. Minimal hash collisions
  2. Performance

It doe

9条回答
  •  借酒劲吻你
    2020-11-28 03:47

    As Nigel Campbell indicated, there's no such thing as the 'best' hash function, as it depends on the data characteristics of what you're hashing as well as whether or not you need cryptographic quality hashes.

    That said, here are some pointers:

    • Since the items you're using as input to the hash are just a set of strings, you could simply combine the hashcodes for each of those individual strings. I've seen the following pseudo-code suggested to do this, but I don't know of any particular analysis of it:

      int hashCode = 0;
      
      foreach (string s in propertiesToHash) {
          hashCode = 31*hashCode + s.GetHashCode();
      }
      

      According to this article, System.Web has an internal method that combines hashcodes using

      combinedHash = ((combinedHash << 5) + combinedHash) ^ nextObj.GetHashCode();
      

      I've also seen code that simply xor's the hashcodes together, but that seems like a bad idea to me (though I again have no analysis to back this up). If nothing else, you end up with a collision if the same strings are hashed in a different order.

    • I've used FNV to good effect: http://www.isthe.com/chongo/tech/comp/fnv/

    • Paul Hsieh has a decent article: http://www.azillionmonkeys.com/qed/hash.html

    • Another nice article by Bob Jenkins that was originally published in 1997 in Doctor Dobb's Journal (the linked article has updates): http://burtleburtle.net/bob/hash/doobs.html

提交回复
热议问题