问题
I’m passing a name string and its SHA1 value into a database. The SHA value is used as an index for searches. After the implementation was done, we got the requirement to make searching the name case insensitive. We do need to take all languages into account (Chinese characters are a real use case).
I know about the Turkey Test. How can I transform my input string before hashing to be case insensitive? Ideally I’d like it to be equivalent of InvariantCultureIgnoreCase.
In other words, how do I make the output of this function case insensitive?
private byte[] ComputeHash(string s)
{
byte[] data = System.Text.Encoding.Unicode.GetBytes(s ?? string.Empty);
SHA1 sha = new SHA1CryptoServiceProvider(); // returns 160 bit value
return sha.ComputeHash(data);
}
If SHA isn’t possible, I might be able to make String.GetHashCode() work, but I don’t see a way to make that case insensitive either.
I'm betting this isn't possible. If its not, what are some work arounds?
回答1:
The existing answers suggesting to use ToLower(Invariant) are wrong: comparing strings after doing ToLower is not equal to doing a string.Compare(xxxIgnoreCase). See the accepted answer here: String comparison - strA.ToLower()==strB.ToLower() or strA.Equals(strB,StringComparisonType)? it breaks down for certain kinds of characters.
The solution is to create a so called SortKey for every string. Such a SortKey essentially is a byte-array with the property that equal bytes mean equal strings. (Also, SortKeys can be compared in a binary way yielding the exact same order that string.Compare yields. But we don't need that property here).
Summary: Use CompareInfo.GetSortKey(string).KeyData to get a hashable byte[]. (GetSortKey on MSDN) This works for all possible cultures. It also works for case-insensitivity.
So a case-insensitive hash for any given string (even with turkish i) can be obtained with:
var sortKeyBytes = CultureInfo.InvariantCulture.CompareInfo.GetSortKey(anyString,
CompareOptions.IgnoreCase).KeyData;
int hashCode = HashByteArray(sortKeyBytes); //Need to provide this function.
...
We can't use GetHashCode() of byte[] as this method is not overridden for byte[]
and therefore defaults to object.GetHashCode()
which uses object identity and not value.
You can use the hash function from this answer. It's not good but it does the job.
回答2:
You could use s.ToUpperInvariant() prior to generating the hash. As long as you do it both ways (generating the original hash, and generating a hash to test against the original), it will work.
回答3:
To make something case insensitive, remove the case:
s = s.ToLowerInvariant();
Do not use CurrentCulture if you can't store it into database and use to convert other string for match like:
s = s.ToLower(System.Globalization.CultureInfo.CurrentCulture);
You may consider using another (non Invariant) culture all the time, but it could be surprise for future code maintainer (one normally expects either Current or Invariant culture for all string operations).
来源:https://stackoverflow.com/questions/10452228/case-insensitive-hash-sha-of-a-string