Clarifying Java's evolutionary support of Unicode [closed]

↘锁芯ラ 提交于 2019-12-01 07:45:46

问题


I'm finding Java's differentiation of char and codepoint to be strange and out of place.

For example, a string is an array of characters or "letters which appear in an alphabet"; in contrast to codepoint which MAY be a single letter or possibly a composite or surrogate pair. However, Java defines a character of a string as a char which cannot be composite or contain a surrogate the codepoint and as an int (this is fine).

But then length() seems to return the number of codepoints while codePointCount() also returns the number of codepoints but instead combines composite characters.. which ends up not really being the real count of codepoints?

It feels as though charAt() should return a String so that composites and surrogates are brought along and the result of length() should swap with codePointCount().

The original implementation feels a little backwards. Is there a reason for the way it's designed the way it is?

Update: codePointAt(), codePointBefore()

It's also worth noting that codePointAt() and codePointBefore() accept an index as a parameter, however, the index acts upon chars and has a range of 0 to length() - 1 and is therefore not based on the number of codepoints in the string, as one might assume.

Update: equalsIgnoreCase()

String.equalsIgnoreCase() uses the term normalization to describe what it does prior to comparing strings. This is a misnomer as normalization in the context of a Unicode string can mean something entirely different. What they mean to say is that they use case-folding.


回答1:


When java was created Unicode didn't have the notion of surrogate characters and java decided to represent characters as 16bit values.

I suppose they don't want to break backwards compatibility. There is a lot more information here: http://www.oracle.com/us/technologies/java/supplementary-142654.html



来源:https://stackoverflow.com/questions/34984271/clarifying-javas-evolutionary-support-of-unicode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!