Why does initial hash value in GetHashCode() implementation generated for anonymous class depend on property names?

问题

When generating GetHashCode() implementation for anonymous class, Roslyn computes the initial hash value based on the property names. For example, the class generated for

var x = new { Int = 42, Text = "42" };

is going to have the following GetHashCode() method:

public override in GetHashCode()
{
   int hash = 339055328;
   hash = hash * -1521134295 + EqualityComparer<int>.Default.GetHashCode( Int );
   hash = hash * -1521134295 + EqualityComparer<string>.Default.GetHashCode( Text );
   return hash;
}

But if we change the property names, the initial value changes:

var x = new { Int2 = 42, Text2 = "42" };

public override in GetHashCode()
{
   int hash = 605502342;
   hash = hash * -1521134295 + EqualityComparer<int>.Default.GetHashCode( Int2 );
   hash = hash * -1521134295 + EqualityComparer<string>.Default.GetHashCode( Text2 );
   return hash;
}

What's the reason behind this behaviour? Is there some problem with just picking a big [prime?] number and using it for all the anonymous classes?

回答1:

Is there some problem with just picking a big [prime?] number and using it for all the anonymous classes?

There is nothing wrong with doing this, it just tends to produce a less efficient value.

The goal of a GetHashCode implementation is to return different results for values which are not equal. This decreases the chance of collisions when the values are used in hash based collections (such as Dictionary<TKey, TValue>).

An anonymous value can never be equal to another anonymous value if they represent different types. The type of an anonymous value is defined by the shape of the properties:

Name of properties
Type of properties
Count of properties

Two anonymous values which differ on any of these characteristics represent different types and hence can never be equal values.

Given this is true it makes sense for the compiler to generate GetHashCode implementations which tend to return different values for different types. This is why the compiler includes the property names when computing the initial hash.

回答2:

Unless someone from the Roslyn team steps up we can only speculate. I would have done it the same way. Using a different seed for each anonymous type seems like a useful way to have more randomness in the hash codes. For example it causes new { a = 1 }.GetHashCode() != new { b = 1 }.GetHashCode() to be true.

I also wonder whether there are any bad seeds that cause the hash code computation to fall apart. I don't think so. Even a 0 seed would work.

The Roslyn source code can be found in AnonymousTypeGetHashCodeMethodSymbol. The initial hash code value is based on a hash of the names of the anonymous type.

来源：https://stackoverflow.com/questions/32808566/why-does-initial-hash-value-in-gethashcode-implementation-generated-for-anonym

标签

roslyn