Creating a hashcode for use in a database (ie not using GetHashCode)

前端 未结 3 1189
迷失自我
迷失自我 2021-01-22 01:28

I have recently been instructed in the ways of GetHashCode() and in particular \"Consumers of GetHashCode cannot rely upon it being stable over time or across appdomains\" (From

3条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-22 02:14

    I would encourage you to consider what the others have said: let the database do what it's good at. Creating a hash code in order to optimize lookups is an indication that the indexes on your table aren't what they should be.

    That said, if you really need a hash code:

    You don't say if you want a 32-bit or 64-bit hash code. This one will create a 64-bit hash code for a string. It's reasonably collision-resistant.

    public static long ComputeHashCode(string url)
    {
        const ulong p = 1099511628211;
    
        ulong hash = 14695981039346656037;
    
        for (int i = 0; i < url.Length; ++i)
        {
            hash = (hash ^ url[i]) * p;
        }
    
        // Wang64 bit mixer
        hash = (~hash) + (hash << 21);
        hash = hash ^ (hash >> 24);
        hash = (hash + (hash << 3)) + (hash << 8);
        hash = hash ^ (hash >> 14);
        hash = (hash + (hash << 2)) + (hash << 4);
        hash = hash ^ (hash >> 28);
        hash = hash + (hash << 31);
    
        if (hash == (ulong)UNKNOWN_RECORD_HASH)
        {
            ++hash;
        }
        return (long)hash;
    }
    

    Note that this is a hash code and the likelihood of a collision is pretty small if you have up to a few billion records. Rule of thumb: you have a 50% chance of collision when the number of items exceeds the square root of your hash code's range. This hash code has a range of 2^64, so if you have 2^32 items, your chance of a collision is about 50%.

    See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=792 and http://en.wikipedia.org/wiki/Birthday_paradox#Probability_table for more information.

提交回复
热议问题