UnicodeString w/ String Literals vs Hex Values

白昼怎懂夜的黑 提交于 2019-12-25 03:37:12

问题


Is there any conceivable reason why I would see different results using unicode string literals versus the actual hex value for the UChar.

UnicodeString s1(0x0040); // @ sign
UnicodeString s2("\u0040");

s1 isn't equivalent to s2. Why?


回答1:


The \u escape sequence AFAIK is implementation defined, so it's hard to say why they are not equivalent without knowing details on your particular compiler. That said, it's simply not a safe way of doing things.

UnicodeString has a constructor taking a UChar and one for UChar32. I'd be explicit when using them:

UnicodeString s(static_cast<UChar>(0x0040));

UnicodeString also provide an unescape() method that's fairly handy:

UnicodeString s = UNICODE_STRING_SIMPLE("\\u4ECA\\u65E5\\u306F").unescape(); // 今日は



回答2:


couldn't reproduce on ICU 4.8.1.1

#include <stdio.h>
#include "unicode/unistr.h"

int main(int argc, const char *argv[]) {
  UnicodeString s1(0x0040); // @ sign
  UnicodeString s2("\u0040");
  printf("s1==s2: %s\n", (s1==s2)?"T":"F");
  //  printf("s1.equals s2: %d\n", s1.equals(s2));
  printf("s1.length: %d  s2.length: %d\n", s1.length(), s2.length());
  printf("s1.charAt(0)=U+%04X s2.charAt(0)=U+%04X\n", s1.charAt(0), s2.charAt(0));
  return 0;
}

=>

s1==s2: T

s1.length: 1 s2.length: 1

s1.charAt(0)=U+0040 s2.charAt(0)=U+0040

gcc 4.4.5 RHEL 6.1 x86_64




回答3:


For anyone else who find's this, here's what I found (in ICU's documentation).

The compiler's and the runtime character set's codepage encodings are not specified by the C/C++ language standards and are usually not a Unicode encoding form. They typically depend on the settings of the individual system, process, or thread. Therefore, it is not possible to instantiate a Unicode character or string variable directly with C/C++ character or string literals. The only safe way is to use numeric values. It is not an issue for User Interface (UI) strings that are translated.

[1] http://userguide.icu-project.org/strings




回答4:


The double quotes in your \u constant are the problem. This evaluated properly:

wchar_t m1( 0x0040 );
wchar_t m2( '\u0040' );
bool equal = ( m1 == m2 );

equal was true.



来源:https://stackoverflow.com/questions/8144886/unicodestring-w-string-literals-vs-hex-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!