How to implement GetHashCode for this situation?

我怕爱的太早我们不能终老 提交于 2019-12-11 09:31:13

问题


I'm trying to implement an IEqualityComparer<string> which basically compares two strings in a way that,(let's assume we have two strings x and y) if x starts with y or y starts with x they should be treated as equal.

public bool Equals(string x, string y)
{
    return x.StartsWith(y) || y.StartsWith(x);
}

public int GetHashCode(string obj)
{
    return obj.GetHashCode();
}

Ofcourse implementing the Equals method is pretty easy.But the GetHashCode is not, I couldn't think any way to implement it correctly.I have written a test program like this:

string[] values = {"hell", "hello", "foo", "fooooo"};

var result = values.Distinct(new StringComparer());

foreach(var x in result)
   Console.WriteLine(x);

And I get the wrong result because of GetHashCode:

hell
hello
foo
fooooo

Obviously I can force calling Equals method by returning same value from the GetHashCode for all values but I wanna know if there is another way to implement it because the performance is critical. Is there a way to implement GetHashCode method correctly for my situation ?

Note: I know it is vague but I couldn't find a better title, if you have a better idea you are free to edit.


Edit: I'm going to use this logic with web urls. In my situation first 20 characters are equal. For example:

http://www.foo.com/bar?id=3
http://www.foo.com/bar?id=3&fooId=23

回答1:


The issue is in your definition of equality: Equality must be transitive. But it is not in your case. Take the following three values:

* f
* freeze
* foo

Then f == freeze, and foo == f, but freeze != foo.

See also MSDN on Implementing the Equals Method, which says:

(x.Equals(y) && y.Equals(z)) returns true if and only if x.Equals(z) returns true.

A proper definition of equality produces distinct sets of values that are considered equal. If you had those, you could define a "canonical" representation for each set and calculate the hash of the canonical value so each set would have its hash code. But this only works with an operation that is transitive (as well as commutative and reflexive, these two properties are covered by your definition).

Since your definition of equality is not transitive you cannot define such sets so you can't find a proper hash code either.

But it raises other questions, too. Taking your example:

string[] values = { "hell", "hello", "foo", "fooooo" };
var result = values.Distinct(new StringComparer());

Which values do you expect to go into your result? Do you always want the shortest version? This will not be guaranteed by your code, the result will depend on the internal implementation of Distinct.

Implementing an EqualityComparer might possibly be a sub-optimal approach to your actual issue. What are you trying to achieve?




回答2:


As strings are equal to each other depending on what string you compare them with, any string can be equal to another. Thus there is only one way to implement the GetHashCode method; return the same value for all strings:

public int GetHashCode(string obj) {
  return 0;
}

This will naturally give a horrible distribution. A dictionary will have a O(n) lookup time instead of O(1), but it works, and it's the only way to make it work for such an equality comparison.



来源:https://stackoverflow.com/questions/25524127/how-to-implement-gethashcode-for-this-situation

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!