Why does string.Compare seem to handle accented characters inconsistently?

后端 未结 3 2065
孤城傲影
孤城傲影 2020-12-10 02:52

If I execute the following statement:

string.Compare(\"mun\", \"mün\", true, CultureInfo.InvariantCulture)

The result is \'-1\', indicating

3条回答
  •  猫巷女王i
    2020-12-10 03:22

    There is a tie-breaking algorithm at work, see http://unicode.org/reports/tr10/

    To address the complexities of language-sensitive sorting, a multilevel comparison algorithm is employed. In comparing two words, for example, the most important feature is the base character: such as the difference between an A and a B. Accent differences are typically ignored, if there are any differences in the base letters. Case differences (uppercase versus lowercase), are typically ignored, if there are any differences in the base or accents. Punctuation is variable. In some situations a punctuation character is treated like a base character. In other situations, it should be ignored if there are any base, accent, or case differences. There may also be a final, tie-breaking level, whereby if there are no other differences at all in the string, the (normalized) code point order is used.

    So, "Munt..." and "Münc..." are alphabetically different and sort based on the "t" and "c".

    Whereas, "mun" and "mün" are alphabetically the same ("u" equivelent to "ü" in lost languages) so the character codes are compared

提交回复
热议问题