C++ crash when use setmode with _O_U8TEXT to deal with unicode

你离开我真会死。 提交于 2021-02-18 15:38:14

问题


What I've tried to print unicode is

_setmode(_fileno(stdout), _O_U8TEXT);
string str = u8"unicode 한글 hangul";
cout << str << endl;

I used setmode to show and get unicode correctly, but It crashed with Debug Assertion Fail.

However,

_setmode(_fileno(stdout), _O_U16TEXT);
wstring str = L"unicode 한글 hangul";
wcout << str << endl;

_O_U16TEXT compile and print correctly.

What should I do to use UTF-8? Do I have to find another trick?


回答1:


_setmode mentions _O_U8TEXT and _O_U16TEXT (finally), but doesn't go into detail what they do. It does state that these are translation modes.

The documentation for _wsopen lists (emphasis mine):

_O_U16TEXT
Opens a file in Unicode UTF-16 mode.
_O_U8TEXT
Opens a file in Unicode UTF-8 mode.

What this means is: when using the unicode io facilities (wprintf, std::wcout, etc.), which means using unicode (UTF-16) strings, the output will be translated to either UTF-16 or UTF-8 when they're written to the file.

Try this:

_setmode(_fileno(stdout), _O_U8TEXT);
std::wcout << L"unicode 한글 hangul\n";

You shouldn't see a difference on a console, but if you redirect the output:

> u8out | hexdump -C
00000000  75 6e 69 63 6f 64 65 20  ed 95 9c ea b8 80 20 68  |unicode ...... h|
00000010  61 6e 67 75 6c 0d 0a                              |angul..|
00000017

> u16out | hexdump -C
00000000  75 00 6e 00 69 00 63 00  6f 00 64 00 65 00 20 00  |u.n.i.c.o.d.e. .|
00000010  5c d5 00 ae 20 00 68 00  61 00 6e 00 67 00 75 00  |\... .h.a.n.g.u.|
00000020  6c 00 0d 00 0a 00                                 |l.....|
00000026

In theory this should mean that you can also use _O_U8TEXT on stdin to read UTF-8 input, but in practice that doesn't always work:

> u8in < u8.txt
unicode 한글 hangul €µöäüß

> u8in
unicode 한글 hangul €µöäüß
unicode ?? hangul ?攄��

_O_U16TEXT appears to work with console input (on my machine), but then you can't use UTF-8 encoded redirected input/output:

> u16in
unicode 한글 hangul €µöäüß
unicode 한글 hangul €µöäüß

You can read more about this here: Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT?

PS: What the assertion is telling you is that you can't use unicode output with the ANSI output facilities. Curiously, that is not enforced if you don't set one of the unicode modes, though...



来源:https://stackoverflow.com/questions/45232484/c-crash-when-use-setmode-with-o-u8text-to-deal-with-unicode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!