How do I calculate a good hash code for a list of strings?

后端 未结 11 1596
我寻月下人不归
我寻月下人不归 2020-12-01 02:34

Background:

  • I have a short list of strings.
  • The number of strings is not always the same, but are nearly always of the order of a “handful”
  • I
11条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-01 03:35

    Using the GetHashCode() is not ideal for combining multiple values. The problem is that for strings, the hashcode is just a checksum. This leaves little entropy for similar values. e.g. adding hashcodes for ("abc", "bbc") will be the same as ("abd", "abc"), causing a collision.

    In cases where you need to be absolutely sure, you'd use a real hash algorithm, like SHA1, MD5, etc. The only problem is that they are block functions, which is difficult to quickly compare hashes for equality. Instead, try a CRC or FNV1 hash. FNV1 32-bit is super simple:

    public static class Fnv1 {
        public const uint OffsetBasis32 = 2166136261;
        public const uint FnvPrime32 = 16777619;
    
        public static int ComputeHash32(byte[] buffer) {
            uint hash = OffsetBasis32;
    
            foreach (byte b in buffer) {
                hash *= FnvPrime32;
                hash ^= b;
            }
    
            return (int)hash;
        }
    }
    

提交回复
热议问题