问题
Is there any conceivable reason why I would see different results using unicode string literals versus the actual hex value for the UChar.
UnicodeString s1(0x0040); // @ sign
UnicodeString s2("\u0040");
s1 isn't equivalent to s2. Why?
回答1:
The \u escape sequence AFAIK is implementation defined, so it's hard to say why they are not equivalent without knowing details on your particular compiler. That said, it's simply not a safe way of doing things.
UnicodeString has a constructor taking a UChar and one for UChar32. I'd be explicit when using them:
UnicodeString s(static_cast<UChar>(0x0040));
UnicodeString also provide an unescape() method that's fairly handy:
UnicodeString s = UNICODE_STRING_SIMPLE("\\u4ECA\\u65E5\\u306F").unescape(); // 今日は
回答2:
couldn't reproduce on ICU 4.8.1.1
#include <stdio.h>
#include "unicode/unistr.h"
int main(int argc, const char *argv[]) {
UnicodeString s1(0x0040); // @ sign
UnicodeString s2("\u0040");
printf("s1==s2: %s\n", (s1==s2)?"T":"F");
// printf("s1.equals s2: %d\n", s1.equals(s2));
printf("s1.length: %d s2.length: %d\n", s1.length(), s2.length());
printf("s1.charAt(0)=U+%04X s2.charAt(0)=U+%04X\n", s1.charAt(0), s2.charAt(0));
return 0;
}
=>
s1==s2: T
s1.length: 1 s2.length: 1
s1.charAt(0)=U+0040 s2.charAt(0)=U+0040
gcc 4.4.5 RHEL 6.1 x86_64
回答3:
For anyone else who find's this, here's what I found (in ICU's documentation).
The compiler's and the runtime character set's codepage encodings are not specified by the C/C++ language standards and are usually not a Unicode encoding form. They typically depend on the settings of the individual system, process, or thread. Therefore, it is not possible to instantiate a Unicode character or string variable directly with C/C++ character or string literals. The only safe way is to use numeric values. It is not an issue for User Interface (UI) strings that are translated.
[1] http://userguide.icu-project.org/strings
回答4:
The double quotes in your \u
constant are the problem. This evaluated properly:
wchar_t m1( 0x0040 );
wchar_t m2( '\u0040' );
bool equal = ( m1 == m2 );
equal
was true
.
来源:https://stackoverflow.com/questions/8144886/unicodestring-w-string-literals-vs-hex-values