Java charAt used with characters that have two code units

前端 未结 4 727
梦毁少年i
梦毁少年i 2020-12-03 14:30

From Core Java, vol. 1, 9th ed., p. 69:

The character ℤ requires two code units in the UTF-16 encoding. Calling

String sentence =         


        
4条回答
  •  一整个雨季
    2020-12-03 14:55

    It sounds like tho book is saying that 'ℤ' is not a UTF-16 character in the basic multilingual plane, but in fact it is.

    Java uses UTF-16 with surrogate pairs for characters that are not in the basic multilingual plane. Since 'ℤ' (0x2124) is in the basic multilingual plane it is represented by a single code unit. In your example sentence.charAt(0) will return 'ℤ', and sentence.charAt(1) will return ' '.

    A character represented by surrogate pairs has two code units making up the character. sentence.charAt(0) would return the first code unit, and sentence.charAt(1) would return the second code unit.

    See http://docs.oracle.com/javase/6/docs/api/java/lang/String.html:

    A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

提交回复
热议问题