How do you safely declare a 16-bit string literal in C?

我只是一个虾纸丫 提交于 2020-08-07 03:40:37

问题


I'm aware that there is already a standard method by prefixing with L:

wchar_t *test_literal = L"Test";

The problem is that wchar_t is not guaranteed to be 16-bits, but for my project, I need a 16-bit wchar_t. I'd also like to avoid the requirement of passing -fshort-wchar.

So, is there any prefix for C (not C++) that will allow me to declare a UTF-16 string literal?


回答1:


So, is there any prefix for C (not C++) that will allow me to declare a UTF-16 string literal?

Almost, but not quite. C2011 offers you these options:

  • character string literals (elements of type char) - no prefix. Example: "Test"
  • UTF-8 string literals (elements of type char) - 'u8' prefix. Example: u8"Test"
  • wide string literals of three flavors:
    • wchar_t elements - 'L' prefix. Example: L"Test"
    • char16_t elements - 'u' prefix. Example: u"Test"
    • char32_t elements - 'U' prefix. Example: U"Test"

Note well, however, that although you can declare a wide string literal having elements of type char16_t, the standard does not guarantee that the UTF-16 encoding will be used for them, nor does it make any particular requirements on which characters outside the language's basic character set must be included in the execution character set. You can test the former at compile time, however: if char16_t represents UTF-16-encoded characters in a given conforming implementation, then that implementation will define the macro __STDC_UTF_16__ to 1.

Note also that you need to include (C's) uchar.h header to use the char16_t type name, but the u"..." syntax for literals does not depend on that. Take care, as this header name collides with one used by the C interface of the International Components for Unicode, a relatively widely-used package for Unicode support.

Finally, be aware that much of this was new in C2011. To make use of it, you need a conforming C2011 implementation. Those are certainly available, but so are a lot of implementations that conform only to earlier standards, or even to none. Standard C99 and earlier do not provide a string literal syntax that guarantees 16-bit elements.




回答2:


You need a 16 bit wchar_t - but it's out of your control. If the compiler says it's 32 bit then it's 32 bit and it doesn't matter what you want or need.

The string classes are templated. You can always use a template to create a template class with 16 bit characters. I personally would try to remove any Unicode handling that is not UTF-8.

An alternative method is a clever #ifdef that will produce a compile time error if wchar_t is not 16 bit, and solve the problem when you actually need to solve it.



来源:https://stackoverflow.com/questions/50657874/how-do-you-safely-declare-a-16-bit-string-literal-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!