C++ UTF-8 output with ICU

半世苍凉 提交于 2019-12-01 03:24:01

Your program will work if you just change the initializer to:

UnicodeString s("привет");

The macro you were using is only for strings that contain "invariant characters", i.e., only latin letters, digits, and some punctuation.

As was said before, input/output codepages are tricky. You said:

My terminal and font support UTF-8 and I regularly use the terminal with UTF-8. My source code is in UTF-8.

That may be true, but ICU doesn't know that's true. The process codepage might be different (let's say iso-8859-1), and the output codepage may be different (let's say shift-jis). Then, the program wouldn't work. But, the invariant characters using the API UNICODE_STRING_SIMPLE would still work.

Hope this helps.

srl, icu dev

What happens if you write the output to a file (either redirecting using pipes from the terminal, or by opening a file stream in the program itself)

That would determine whether or not it is the terminal that fails to handle the output correctly.

What happens if you inspect the output string in the debugger? Does it contain the correct values? Find out what the UTF-8 encoding of your string should look like, and compare it against what you get in the debugger. Or print out the integral value of each byte, and verify that those are correct.

When working with encoding it is always tricky (but essential) to determine whether the problem lies in your program itself or in the conversion that happens when the text is output to the system. Take the terminal out of the equation and verify that your program generates the correct output.

operator<<(ostream, UnicodeString) converts between UTF16 and chars by using ICU's "default converter". AFAIU, the "default converter" (if you don't set it explicitly with ucnv_setDefaultName()) depends on the platform and the way ICU was compiled. What do you get from ucnv_getDefaultName()?

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!