Why doesn't ICU4J match UTF-8 sort order?

后端 未结 2 1093
我在风中等你
我在风中等你 2021-01-26 15:42

I am having a hard time understanding unicode sorting order.

When I run Collator.getInstance(Locale.ENGLISH).compare(\"_\", \"#\") under ICU4J 55.1 I get a

2条回答
  •  甜味超标
    2021-01-26 16:41

    Converting Mark Ransom's comments into an answer:

    • The ordering of individual characters is based on a collation table, which has little relationship to the codepoint numbers. See: http://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table
    • If you follow the first link on that page, it leads to allkeys.txt which gives the default collation ordering.
    • In particular, _ is 005F ; [*020B.0020.0002] # LOW LINE while # is 0023 ; [*0391.0020.0002] # NUMBER SIGN. Note that the collation numbers for _ are lower than the numbers for #.

提交回复
热议问题