Unexpected behavior when sorting strings with letters and dashes

后端 未结 2 858
时光说笑
时光说笑 2020-12-03 17:10

If I have some list of strings contain all numbers and dashes they will sort ascending like so:

s = s.OrderBy(t => t).ToList();

66-06162

相关标签:
2条回答
  • 2020-12-03 17:53

    Here is the remark from MSDN:

    Character sets include ignorable characters. The Compare(String, String) method does not consider such characters when it performs a culture-sensitive comparison. For example, if the following code is run on the .NET Framework 4 or later, a culture-sensitive comparison of "animal" with "ani-mal" (using a soft hyphen, or U+00AD) indicates that the two strings are equivalent.

    So it looks like you are experiencing this ignorable character case. If we assume that the - symbol has a very small weight in comparison, the results of the sorting look like this.

    First case:

    660616280000
    660616280100
    6606162801000
    6606162801040
    

    Second case:

    66061628000A
    660616280100A
    660616280104A
    66061628010A 
    

    Which makes sense

    0 讨论(0)
  • 2020-12-03 18:06

    It's because the default StringComparer is culture-sensitive. As far as I can tell, Comparer<string>.Default delegates to string.CompareTo(string) which uses the current culture:

    This method performs a word (case-sensitive and culture-sensitive) comparison using the current culture. For more information about word, string, and ordinal sorts, see System.Globalization.CompareOptions.

    Then the page for CompareOptions includes:

    The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

    ("Small weight" isn't quite the same as "ignored" as quoted in Andrei's answer, but the effects are similar here.)

    If you specify StringComparer.Ordinal, you get results of:

    66-0616280-00A
    66-0616280-10A
    66-0616280100A
    66-0616280104A
    

    Specify it as the second argument to OrderBy:

    s = s.OrderBy(t => t, StringComparer.Ordinal).ToList();
    

    You can see the difference here:

    Console.WriteLine(Comparer<string>.Default.Compare
        ("66-0616280104A", "66-0616280-10A"));
    Console.WriteLine(StringComparer.Ordinal.Compare
        ("66-0616280104A", "66-0616280-10A"));
    
    0 讨论(0)
提交回复
热议问题