astral-plane

In Windows, how do you enter a character outside of the Unicode Basic Multilingual Plane?

[亡魂溺海] 提交于 2019-12-04 21:56:42
问题 I know that Windows has supported supplemental planes since Windows XP. I have fonts which I know have characters outside the basic multilingual plane (BMP). For these characters, the Unicode codepoint consists of five hexadecimal digits. I do not know how to enter these characters in applications. Windows seems to only support keyboard entry of characters in the BMP. You can either enter a decimal number or some applications allow you to enter a four digit hexadecimal number. Can someone

In Windows, how do you enter a character outside of the Unicode Basic Multilingual Plane?

老子叫甜甜 提交于 2019-12-03 12:52:35
I know that Windows has supported supplemental planes since Windows XP. I have fonts which I know have characters outside the basic multilingual plane (BMP). For these characters, the Unicode codepoint consists of five hexadecimal digits. I do not know how to enter these characters in applications. Windows seems to only support keyboard entry of characters in the BMP. You can either enter a decimal number or some applications allow you to enter a four digit hexadecimal number. Can someone confirm how entry is managed? I don't care if it directly from the keyboard or application-assisted. (The

What version of Unicode is supported by which .NET platform and on which version of Windows in regards to character classes?

前提是你 提交于 2019-12-02 23:43:40
Updated question ¹ With regards to character classes, comparison, sorting, normalization and collations, what Unicode version or versions are supported by which .NET platforms? Original question I remember somewhat vaguely having read that .NET supported Unicode version 3.0 and that the internal UTF-16 encoding is not really UTF-16 but actually uses UCS-2, which is not the same. It seems, for instance, that characters above U+FFFF are not possible, i.e. consider: string s = "\u1D7D9"; // ("Mathematical double-struck digit one") and it stores the string "ᵽ9" . I'm basically looking for

Retrieve Unicode code points > U+FFFF from QChar

怎甘沉沦 提交于 2019-12-01 04:28:45
I have an application that is supposed to deal with all kinds of characters and at some point display information about them. I use Qt and its inherent Unicode support in QChar, QString etc. Now I need the code point of a QChar in order to look up some data in http://unicode.org/Public/UNIDATA/UnicodeData.txt , but QChar's unicode() method only returns a ushort (unsigned short), which usually is a number from 0 to 65535 (or 0xFFFF). There are characters with code points > 0xFFFF, so how do I get these? Is there some trick I am missing or is this currently not supported by Qt/QChar? Each QChar

char to Unicode more than U+FFFF in java?

余生长醉 提交于 2019-11-30 08:03:32
问题 How can I display a Unicode Character above U+FFFF using char in Java? I need something like this (if it were valid): char u = '\u+10FFFF'; 回答1: You can't do it with a single char (which holds a UTF-16 code unit), but you can use a String : // This represents U+10FFFF String x = "\udbff\udfff"; Alternatively: String y = new StringBuilder().appendCodePoint(0x10ffff).toString(); That is a surrogate pair (two UTF-16 code units which combine to form a single Unicode code point beyond the Basic

C# Regular Expressions with \\Uxxxxxxxx characters in the pattern

醉酒当歌 提交于 2019-11-29 10:36:57
Regex.IsMatch( "foo", "[\U00010000-\U0010FFFF]" ) Throws: System.ArgumentException: parsing "[-]" - [x-y] range in reverse order. Looking at the hex values for \U00010000 and \U0010FFF I get: 0xd800 0xdc00 for the first character and 0xdbff 0xdfff for the second. So I guess I have really have one problem. Why are the Unicode characters formed with \U split into two chars in the string? They're surrogate pairs . Look at the values - they're over 65535. A char is only a 16 bit value. How would you expression 65536 in only 16 bits? Unfortunately it's not clear from the documentation how (or

char to Unicode more than U+FFFF in java?

孤者浪人 提交于 2019-11-29 06:03:15
How can I display a Unicode Character above U+FFFF using char in Java? I need something like this (if it were valid): char u = '\u+10FFFF'; Jon Skeet You can't do it with a single char (which holds a UTF-16 code unit), but you can use a String : // This represents U+10FFFF String x = "\udbff\udfff"; Alternatively: String y = new StringBuilder().appendCodePoint(0x10ffff).toString(); That is a surrogate pair (two UTF-16 code units which combine to form a single Unicode code point beyond the Basic Multilingual Plane). Of course, you need whatever's going to display your data to cope with it too..

How would you get an array of Unicode code points from a .NET String?

回眸只為那壹抹淺笑 提交于 2019-11-28 07:32:01
I have a list of character range restrictions that I need to check a string against, but the char type in .NET is UTF-16 and therefore some characters become wacky (surrogate) pairs instead. Thus when enumerating all the char 's in a string , I don't get the 32-bit Unicode code points and some comparisons with high values fail. I understand Unicode well enough that I could parse the bytes myself if necessary, but I'm looking for a C#/.NET Framework BCL solution. So ... How would you convert a string to an array ( int[] ) of 32-bit Unicode code points? This answer is not correct. See @Virtlink

C# Regular Expressions with \Uxxxxxxxx characters in the pattern

那年仲夏 提交于 2019-11-28 03:50:43
问题 Regex.IsMatch( "foo", "[\U00010000-\U0010FFFF]" ) Throws: System.ArgumentException: parsing "[-]" - [x-y] range in reverse order. Looking at the hex values for \U00010000 and \U0010FFF I get: 0xd800 0xdc00 for the first character and 0xdbff 0xdfff for the second. So I guess I have really have one problem. Why are the Unicode characters formed with \U split into two chars in the string? 回答1: They're surrogate pairs. Look at the values - they're over 65535. A char is only a 16 bit value. How

Java charAt used with characters that have two code units

假如想象 提交于 2019-11-27 22:57:56
From Core Java , vol. 1, 9th ed., p. 69: The character ℤ requires two code units in the UTF-16 encoding. Calling String sentence = "ℤ is the set of integers"; // for clarity; not in book char ch = sentence.charAt(1) doesn't return a space but the second code unit of ℤ. But it seems that sentence.charAt(1) does return a space. For example, the if statement in the following code evaluates to true . String sentence = "ℤ is the set of integers"; if (sentence.charAt(1) == ' ') System.out.println("sentence.charAt(1) returns a space"); Why? I'm using JDK SE 1.7.0_09 on Ubuntu 12.10, if it's relevant.