C++ Literals and Unicode

为君一笑 提交于 2019-12-11 05:08:47

问题


C++ Literals

Environment:

  • OS: Windows 10 Pro;
  • Compiler: GCC latest.
  • IDE: Code::Blocks latest.
  • working on: Console applications.

My understanding for numerical literals prefixes is that they are useful to determine the numerical value type (not sure).However, I have a lot of confusion on character and string literals prefixes and suffixes. I read a lot and spent days trying to understand the situation, but I got more questions and few answers. so I thought stack overflow could be of a lot of help.

Qs:

1- What are the correct use for the string prefixes u8 u U L?

I have the following code as example:

#include <iostream>
#include <string>
using namespace std;

int main()
{
    cout << "\n\n Hello World! (plain) \n";
    cout << u8"\n Hello World! (u8) \n";
    cout << u"\n Hello World! (u) \n";
    cout << U"\n Hello World! (U) \n";
    cout << L"\n Hello World! (plain) \n\n";

    cout << "\n\n\n";
}

The output is like this:

Hello World! (plain)

Hello World! (u8)

0x47f0580x47f0840x47f0d8

Q2: Why U u ans L has such output? I expected it is just to determine type not do encoding mapping (if it is).

Q3 Is there a simple and to the point references about encodings like UTF-8. I am confused about them, in addition I doubt that console applications is capable to deal with them. I see it is crucial to understand them.

Q4: Also I will appreciate a step by step reference that explain custom type literals.


回答1:


First see: http://en.cppreference.com/w/cpp/language/string_literal

std::cout's class operator << is properly overloaded to print const char*. That is why the first two strings are printed.

cout << "\n\n Hello World! (plain) \n";
cout << u8"\n Hello World! (u8) \n";

As expected, prints1:

Hello World! (plain)

Hello World! (u8)

Meanwhile std::cout's class has no special << overload for const char16_t*, const char32_t* and const wchar_t*, hence it will match <<'s overload for printing pointers, that is why:

cout << u"\n Hello World! (u) \n";
cout << U"\n Hello World! (U) \n";
cout << L"\n Hello World! (plain) \n\n";

Prints:

0x47f0580x47f0840x47f0d8

As you can see, there are actually 3 pointer values printed there: 0x47f058, 0x47f084 and 0x47f0d8


However, for the last one, you can get it to print properly using std::wcout

std::wcout << L"\n Hello World! (plain) \n\n";

prints

 Hello World! (plain)

1: The u8 literal printed as expected because of the direct ASCII mapping of the first few codepoints of UTF-8.




回答2:


1) Narrow multibyte string literal. The type of an unprefixed string literal is const char[].

2) Wide string literal. The type of a L"..." string literal is const wchar_t[].

3) UTF-8 encoded string literal. The type of a u8"..." string literal is const char[].

4) UTF-16 encoded string literal. The type of a u"..." string literal is const char16_t[].

5) UTF-32 encoded string literal. The type of a U"..." string literal is const char32_t[].

6) Raw string literal. Used to avoid escaping of any character, anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.

std::cout expects single byte characters, otherwise it can output a value such as 0x47f0580x47f0840x47f0d8. If your trying to output literals that consist of multi-byte characters (char16_t, char32_t, or wchar_t) then you need to use std::wcout to output them to the console, or convert them to a single byte character type. Raw string literals are very handy for formatting output. An example of Raw string literals is R"~(This is the text that will be output just as I typed it into the code editor!)~" and will be a single byte character string. If it's prefixed with any of the multi-byte qualifiers the raw string literal will be multi-byte. Here is a very comprehensive reference on string literals.



来源:https://stackoverflow.com/questions/42354126/c-literals-and-unicode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!