问题
I made a program where people can type in 4 letters and it will give you the corresponding unicode character that it inserts in a textflow element. Now i had a lot of problems with this, but in the end i succeeded with some help. Now the problem came when i typed "dddd" or "ddd1" as a test.
I got the error - "An unpaired Unicode surrogate was encountered in the input."
Now i spend like 2 days testing for that, and there was absolutly no event triggering that made it possible for me to test for the error before it occurred.
The code:
str = "dddd"
num = parseInt(str,16)
res = String.fromCharCode(num)
Acutally when the error occurres res is equal to "?" in the console ... but if you test for it with if(res == "?") it returns false.
MY QUESTION: Now i searched and searched and found abolutly no description on this error in adobes as3 reference, but after 2 days i found this page for javascript: http://scripts.sil.org/cms/scripts/page.php?item_id=IWS-Chapter04a
It says that - The code units in the range 0xD800–0xDFFF, serve a special purpose, however. These code units, known as surrogate code units
So now i test with:
if( num > 0 && num < uint(0xD800)) || ( num > uint(0xDFFF) && num < uint(0xFFFF) ){
get unicode character.
}
my question is simply if i understood this correctly, that this will actually prevent the error from occurring? - I'm no unicode specialist and don't know really how to test for it, since there are ten's of thousands characters so i might have missed one and that would mean that the users by accident could get the error and risk crashing the application.
回答1:
You are correct. A code point ("high surrogate") between 0xD800-0xDBFF
must be paired with a code point ("low surrogate") between 0xDC00-0xDFFF
. Those are reserved for use in UTF-16[1] - when needing to address the higher planes that don't fit in 16 bits - and hence those code points can't appear on their own. For example:
0xD802 DC01
corresponds to (I'll leave out the 0x
hex markers):
10000 + (high - D800) * 0400 + (low - DC00)
10000 + (D802 - D800) * 0400 + (DC01 - DC00)
= 10000 + 0002 * 0400 + 0001
= 10801 expressed as UTF-16
... just adding that bit of into in case you later need to support it.
I haven't tested the AS3 functionality for the following, but you may want to also test the input below - you won't get the surrogate error for these, but might get another error message:
0xFFFE
and0xFFFF
(when using higher planes, also any code point "ending" with those bits, e.g.0x1FFFE
and0x1FFFF
;0x2FFFE
and0x2FFFF
etc.) Those are "non-characters".- The same goes for
0xFDD0-0xFEDF
- also "non-characters".
- AS3 actually uses UTF-16 to store its strings, but even if it didn't, the surrogate code points would still have no meaning outside pairs - the code points are reserved and can't be used in other Unicode encodings either (e.g. UTF-8 or UTF-32)
来源:https://stackoverflow.com/questions/17957569/big-unicode-problems-as3