Is the u8 string literal necessary in C++11

前端 未结 4 1692
鱼传尺愫
鱼传尺愫 2020-12-13 19:11

From Wikipedia:

For the purpose of enhancing support for Unicode in C++ compilers, the definition of the type char has been modified to be at least th

4条回答
  •  清歌不尽
    2020-12-13 19:45

    The compiler chooses a native encoding natural to the platform. On typical POSIX systems it will probably choose ASCII and something possibly depending on environment's setting for character values outside the ASCII range. On mainframes it will probably choose EBCDIC. Comparing strings received, e.g., from files or the command line will probably work best with the native character set. When processing files explicitly encoded using UTF-8 you are, however, probably best off using u8"..." strings.

    That said, with the recent changes relating to character encodings a fundamental assumption of string processing in C and C++ got broken: each internal character object (char, wchar_t, etc.) used to represent one character. This is clearly not true anymore for a UTF-8 string where each character object just represents a byte of some character. As a result all the string manipulation, character classification, etc. functions won't necessarily work on these strings. We don't have any good library lined up to deal with such strings for inclusion into the standard.

提交回复
热议问题