Given the number of a Unicode code point, how can I obtain a String or CharSequence object for that character

纵然是瞬间 提交于 2021-02-02 09:10:58

问题


I have seen Questions and Answers about obtaining the code point number of a Unicode character in Java. For example, the Question How can I get a Unicode character's code?.

But I want the opposite: given an integer number, how do I get text of that character assigned to that code point number?

The char primitive data type is of no use, being limited to only the Basic Multilingual Plane of the Unicode character set. That plane represents approximately the first 64,000 characters defined in Unicode. But Unicode has grown to nearly double that, over 113,000 characters defined now. The numbers assigned to characters range over a million. Being based on 16-bits, a char is limited to a range of 64K, not nearly enough.

Both Character and String classes offer the method codePointAt to examine a character and return an int representing the code point assigned in Unicode. I am looking for the opposite.

➥ Given an int, how to get an object of Character, String, or some implementation of CharSequence that I can then join to other text?

When writing string literals, we can use a Unicode escape sequence with the backslash-with-u. But I am interested in working with integer variables, soft-coding rather than hardcoding the Unicode characters.


回答1:


tl;dr

String s = Character.toString( 128_567 ) ;

😷

Details

You asked for an object of Character, String, or some implementation of CharSequence.

Character

The Character class is actually legacy, a mere object wrapper around the primitive char type. The char type is legacy too, being defined internally as a 16-bit number limited to the first 64K of Unicode code points. Unicode now has more than twice than number of code points assigned to characters, so char fails to represent most characters.

So we cannot instantiate a Character object for a character outside the Basic Multilingual Plane set of characters. So, as a workaround, Character.toString( int ) produces a String containing a single character. String can handle any and all Unicode characters, while Character cannot.

String 🡄 Character.toString( int )

To get a String object containing a single character determined by an int, pass the int to Character.toString().

As an example, we use FACE WITH MEDICAL MASK, an emoji character at U+1F637 (decimal: 128,567).

// -----|  input  |----------------
String input = "😷" ;                                 // FACE WITH MEDICAL MASK at code point U+1F637 (decimal: 128,567).
int codePoint = input.codePointAt( 0 ) ;              // Returns 128,567. 
System.out.println( "codePoint : " + codePoint ) ;   

codePoint : 128567

Convert that int primitive variable to a String.

// -----|  String  |----------------
String output = Character.toString( codePoint ) ;     // Pass an `int` primitive integer number.
System.out.println( "output : " + output ) ; 

output : 😷

Or use a literal integer number.

String output2 = Character.toString( 128_567 ) ;      // Pass an integer literal.
System.out.println( "output2 : " + output2 ) ;

output2 : 😷

See this code run live at IdeOne.com.

CharSequence

The code above works, as String is an implementation of CharSequence.

CharSequence cs = Character.toString( 128_567 ) ;     // Returns a `String` which is a `CharSequence`. 

I am surprised that I cannot find any way to add a character to an object of either the StringBuilder or StringBuffer classes that implement CharSequence. Again, perhaps I have failed to notice such a method.



来源:https://stackoverflow.com/questions/60347814/given-the-number-of-a-unicode-code-point-how-can-i-obtain-a-string-or-charseque

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!