Why does initial hash value in GetHashCode() implementation generated for anonymous class depend on property names?

自闭症网瘾萝莉.ら 提交于 2019-12-23 10:15:24

问题


When generating GetHashCode() implementation for anonymous class, Roslyn computes the initial hash value based on the property names. For example, the class generated for

var x = new { Int = 42, Text = "42" };

is going to have the following GetHashCode() method:

public override in GetHashCode()
{
   int hash = 339055328;
   hash = hash * -1521134295 + EqualityComparer<int>.Default.GetHashCode( Int );
   hash = hash * -1521134295 + EqualityComparer<string>.Default.GetHashCode( Text );
   return hash;
}

But if we change the property names, the initial value changes:

var x = new { Int2 = 42, Text2 = "42" };

public override in GetHashCode()
{
   int hash = 605502342;
   hash = hash * -1521134295 + EqualityComparer<int>.Default.GetHashCode( Int2 );
   hash = hash * -1521134295 + EqualityComparer<string>.Default.GetHashCode( Text2 );
   return hash;
}

What's the reason behind this behaviour? Is there some problem with just picking a big [prime?] number and using it for all the anonymous classes?


回答1:


Is there some problem with just picking a big [prime?] number and using it for all the anonymous classes?

There is nothing wrong with doing this, it just tends to produce a less efficient value.

The goal of a GetHashCode implementation is to return different results for values which are not equal. This decreases the chance of collisions when the values are used in hash based collections (such as Dictionary<TKey, TValue>).

An anonymous value can never be equal to another anonymous value if they represent different types. The type of an anonymous value is defined by the shape of the properties:

  • Name of properties
  • Type of properties
  • Count of properties

Two anonymous values which differ on any of these characteristics represent different types and hence can never be equal values.

Given this is true it makes sense for the compiler to generate GetHashCode implementations which tend to return different values for different types. This is why the compiler includes the property names when computing the initial hash.




回答2:


Unless someone from the Roslyn team steps up we can only speculate. I would have done it the same way. Using a different seed for each anonymous type seems like a useful way to have more randomness in the hash codes. For example it causes new { a = 1 }.GetHashCode() != new { b = 1 }.GetHashCode() to be true.

I also wonder whether there are any bad seeds that cause the hash code computation to fall apart. I don't think so. Even a 0 seed would work.

The Roslyn source code can be found in AnonymousTypeGetHashCodeMethodSymbol. The initial hash code value is based on a hash of the names of the anonymous type.



来源:https://stackoverflow.com/questions/32808566/why-does-initial-hash-value-in-gethashcode-implementation-generated-for-anonym

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!