Print char32_t to console

余生长醉 提交于 2019-12-08 06:20:54

问题


How can I print (cout / wcout / ...) char32_t to console in C++11?

The following code prints hex values:

u32string s2 = U"Добрый день";
for(auto x:s2){
    wcout<<(char32_t)x<<endl;
}

回答1:


First, I don't think wcout is supposed to print as characters anything but char and wchar_t. char32_t is neither.

Here's a sample program that prints individual wchar_t's:

#include <iostream>

using namespace std;

int main()
{
  wcout << (wchar_t)0x41 << endl;
  return 0;
}

Output (ideone):

A

Currently, it's impossible to get consistent Unicode output in the console even in major OSes. Simplistic Unicode text output via cout, wcout, printf(), wprintf() and the like won't work on Windows without major hacks. The problem of getting readable Unicode text in the Windows console is in having and being able to select proper Unicode fonts. Windows' console is quite broken in this respect. See this answer of mine and follow the link(s) in it.




回答2:


I know this is very old, but I had to solve it on my own and there you go. The idea is to switch between UTF-8 and UTF-32 encodings of Unicode: you can cout u8 strings, so just translate the UTF-32 encoded char32_t to it and you're done. Those are the low level functions I came up with (no Modern C++). Probably those can be optimized, also: any suggestion is appreciated.

char* char_utf32_to_utf8(char32_t utf32, const char* buffer)
// Encodes the UTF-32 encoded char into a UTF-8 string. 
// Stores the result in the buffer and returns the position 
// of the end of the buffer
// (unchecked access, be sure to provide a buffer that is big enough)
{
    char* end = const_cast<char*>(buffer);
    if(utf32 < 0x7F) *(end++) = static_cast<unsigned>(utf32);
    else if(utf32 < 0x7FF) {
        *(end++) = 0b1100'0000 + static_cast<unsigned>(utf32 >> 6);
        *(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
    }
    else if(utf32 < 0x10000){
        *(end++) = 0b1110'0000 + static_cast<unsigned>(utf32 >> 12);
        *(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 6) & 0b0011'1111);
        *(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
    } else if(utf32 < 0x110000) {
        *(end++) = 0b1111'0000 + static_cast<unsigned>(utf32 >> 18);
        *(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 12) & 0b0011'1111);
        *(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 6) & 0b0011'1111);
        *(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
    }
    else throw encoding_error(end);
    *end = '\0';
    return end;
}

You can implement this function in a class if you want, in a constructor, in a template, or whatever you prefer.

Follows the overloaded operator with the char array

std::ostream& operator<<(std::ostream& os, const char32_t* s)
{
    const char buffer[5] {0}; // That's the famous "big-enough buffer"
    while(s && *s)
    {
        char_utf32_to_utf8(*(s++), buffer);
        os << buffer;
    }
    return os;
}

and with the u32string

std::ostream& operator<<(std::ostream& os, const std::u32string& s)
{
    return (os << s.c_str());
}

Running the simplest stupidest test with the Unicode characters found on Wikipedia

int main()
{
    std::cout << std::u32string(U"\x10437\x20AC") << std::endl;
}

leads to 𐐷€ printed on the (Linux) console. This should be tested with different Unicode characters, though...

Also this varies with endianness but I'm sure you can find the solution looking at this.



来源:https://stackoverflow.com/questions/15857721/print-char32-t-to-console

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!