Ignoring accented letters in string comparison

后端 未结 6 1256
一向
一向 2020-11-22 10:37

I need to compare 2 strings in C# and treat accented letters the same as non-accented letters. For example:

string s1 = \"hello\";
string s2 = \"héllo\";

s1         


        
6条回答
  •  再見小時候
    2020-11-22 11:11

    EDIT 2012-01-20: Oh boy! The solution was so much simpler and has been in the framework nearly forever. As pointed out by knightpfhor :

    string.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace);
    

    Here's a function that strips diacritics from a string:

    static string RemoveDiacritics(string text)
    {
      string formD = text.Normalize(NormalizationForm.FormD);
      StringBuilder sb = new StringBuilder();
    
      foreach (char ch in formD)
      {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
          sb.Append(ch);
        }
      }
    
      return sb.ToString().Normalize(NormalizationForm.FormC);
    }
    

    More details on MichKap's blog (RIP...).

    The principle is that is it turns 'é' into 2 successive chars 'e', acute. It then iterates through the chars and skips the diacritics.

    "héllo" becomes "hello", which in turn becomes "hello".

    Debug.Assert("hello"==RemoveDiacritics("héllo"));
    

    Note: Here's a more compact .NET4+ friendly version of the same function:

    static string RemoveDiacritics(string text)
    {
      return string.Concat( 
          text.Normalize(NormalizationForm.FormD)
          .Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch)!=
                                        UnicodeCategory.NonSpacingMark)
        ).Normalize(NormalizationForm.FormC);
    }
    

提交回复
热议问题