GetHashCode method with Dictionary and HashSet

久未见 提交于 2019-12-01 11:23:54

Because the logic of ContainsKey is similar to this.

//This is a simplified model for answering the OP's question, the real one is more complex.
private List<List<KeyValuePair<TKey,TValue>>> _buckets = //....

public bool ContainsKey(TKey key)
{
    List<KeyValuePair<TKey,TValue>> bucket = _buckets[key.GetHashCode() % _buckets.Length];
    foreach(var item in bucket)
    {
        if(key.Equals(item.Key))
            return true;
    }
    return false;
}

All GetHashCode does is get the bucket your key would go in, it still must go through each member of that bucket and find the exact match via the Equals method. That is why having good hash codes is important, the less items in a bucket the faster the foreach part will be. The best possible hashcode will have only one item per bucket.


Here is the actual code for Contains on a HashSet (Dictionary's ContainsKey is very similar but more complex)

private int[] m_buckets;
private Slot[] m_slots;

public bool Contains(T item) {
    if (m_buckets != null) {
        int hashCode = InternalGetHashCode(item);
        // see note at "HashSet" level describing why "- 1" appears in for loop
        for (int i = m_buckets[hashCode % m_buckets.Length] - 1; i >= 0; i = m_slots[i].next) {
            if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, item)) {
                return true;
            }
        }
    }
    // either m_buckets is null or wasn't found
    return false;
}

private int InternalGetHashCode(T item) {
    if (item == null) {
        return 0;
    } 
    return m_comparer.GetHashCode(item) & Lower31BitMask;
}

internal struct Slot {
    internal int hashCode;      // Lower 31 bits of hash code, -1 if unused
    internal T value;
    internal int next;          // Index of next entry, -1 if last
}

The hashcodes don't have to be guaranteed to be unique, they must be equal if the keys are equal.

Now what happens is that the items are stored in buckets. If you query whether a Dictionary<TK,TV> contains a given key or the HashSet<T> a given item, it will first calculate the hashcode to fetch the correct bucket.

Next it will iterate over all items in the bucket and perform .Equals tests on it. Only in case one of these matches, it will return true.

In other words, one is allowed to return the same hashcode for every instance although the instances are different. It only makes the hashing inefficient.

C# thus stores a Dictionary<TK,TV> like:

+----------+
| 22008501 |---<car1,1>----<car3,1>----|
+----------+
| 11155414 | (other bucket)
+----------+

With on the left side (possible bucket labels), although for small Dictionary's, the number of buckets will be very small, and an operations will be performed on the hash (for instance a modulo), to make the number of outcomes smaller.

Now if you query whether car2 is in the Dictionary, it will calculate the hash, and thus take the first bucket. Then it will iterate, and perform an equality check on car1 vs car2, next car3 vs car2 and it will reach the end of the bucket and return false. This is because the default Equals operation is reference equality. Only if you override that too, (for instance all cars are the same, you can return true).

As you noticed, car1.Equals(car2) isn't true. Dictionary and Hashset membership will only be true for objects that are equal. That means .Equals() returns true. This is only tested if their hashcodes are first found to be equal.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!