Comparing a char to a code-point?

前端 未结 5 1730
不知归路
不知归路 2020-12-01 02:15

What is the \"correct\" way of comparing a code-point to a Java character? For example:

int codepoint = String.codePointAt(0);
char token = \'\\n\';
<         


        
5条回答
  •  星月不相逢
    2020-12-01 02:32

    Java uses a 16-bit (UTF-16) model for handling characters, so any characters with codepoints > 0xFFFF are stored in the strings as pairs of 16-bit characters using two surrogate characters to represent the plane and character within the plane.

    If you want to handle characters and strings properly according to the full Unicode standard, you need to process strings taking this into account.

    XML cares a lot about this; it's useful to access the XMLChar class in Xerces (which comes with Java version 5.0 and higher) for character-related code.

    It's also instructive to look at the Saxon XSLT/XQuery processor, since being a well-behaved XML application, it has to take into account how Java stores codepoints in strings. XQuery 1.0 and XPath 2.0 have functions for codepoints-to-string and string-to-codepoints; it might be instructive to get a copy of Saxon and play with them to see how they work.

提交回复
热议问题