UTF-8 output on Windows console

后端 未结 3 459
無奈伤痛
無奈伤痛 2020-12-10 06:03

The following code shows unexpected behaviour on my machine (tested with Visual C++ 2008 SP1 on Windows XP and VS 2012 on Windows 7):

#include 

        
相关标签:
3条回答
  • 2020-12-10 06:08

    It's time to close this now. Stephan T. Lavavej says the behaviour is "by design", although I cannot follow this explanation.

    My current knowledge is: Windows XP console in UTF-8 codepage does not work with C++ iostreams.

    Windows XP is getting out of fashion now and so does VS 2008. I'd be interested to hear if the problem still exists on newer Windows systems.

    On Windows 7 the effect is probably due to the way the C++ streams output characters. As seen in an answer to Properly print utf8 characters in windows console, UTF-8 output fails with C stdio when printing one byte after after another like putc('\xc3'); putc('\xbc'); as well. Perhaps this is what C++ streams do here.

    0 讨论(0)
  • 2020-12-10 06:17

    I understand the question is quite old, but if someone would still be interested, below is my solution. I've implemented a quite simple std::streambuf descendant and then passed it to each of standard streams on the very beginning of program execution.

    This allows you to use UTF-8 everywhere in your program. On input, data is taken from console in Unicode and then converted and returned to you in UTF-8. On output the opposite is done, taking data from you in UTF-8, converting it to Unicode and sending to console. No issues found so far.

    Also note, that this solution doesn't require any codepage modification, with either SetConsoleCP, SetConsoleOutputCP or chcp, or something else.

    That's the stream buffer:

    class ConsoleStreamBufWin32 : public std::streambuf
    {
    public:
        ConsoleStreamBufWin32(DWORD handleId, bool isInput);
    
    protected:
        // std::basic_streambuf
        virtual std::streambuf* setbuf(char_type* s, std::streamsize n);
        virtual int sync();
        virtual int_type underflow();
        virtual int_type overflow(int_type c = traits_type::eof());
    
    private:
        HANDLE const m_handle;
        bool const m_isInput;
        std::string m_buffer;
    };
    
    ConsoleStreamBufWin32::ConsoleStreamBufWin32(DWORD handleId, bool isInput) :
        m_handle(::GetStdHandle(handleId)),
        m_isInput(isInput),
        m_buffer()
    {
        if (m_isInput)
        {
            setg(0, 0, 0);
        }
    }
    
    std::streambuf* ConsoleStreamBufWin32::setbuf(char_type* /*s*/, std::streamsize /*n*/)
    {
        return 0;
    }
    
    int ConsoleStreamBufWin32::sync()
    {
        if (m_isInput)
        {
            ::FlushConsoleInputBuffer(m_handle);
            setg(0, 0, 0);
        }
        else
        {
            if (m_buffer.empty())
            {
                return 0;
            }
    
            std::wstring const wideBuffer = utf8_to_wstring(m_buffer);
            DWORD writtenSize;
            ::WriteConsoleW(m_handle, wideBuffer.c_str(), wideBuffer.size(), &writtenSize, NULL);
        }
    
        m_buffer.clear();
    
        return 0;
    }
    
    ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::underflow()
    {
        if (!m_isInput)
        {
            return traits_type::eof();
        }
    
        if (gptr() >= egptr())
        {
            wchar_t wideBuffer[128];
            DWORD readSize;
            if (!::ReadConsoleW(m_handle, wideBuffer, ARRAYSIZE(wideBuffer) - 1, &readSize, NULL))
            {
                return traits_type::eof();
            }
    
            wideBuffer[readSize] = L'\0';
            m_buffer = wstring_to_utf8(wideBuffer);
    
            setg(&m_buffer[0], &m_buffer[0], &m_buffer[0] + m_buffer.size());
    
            if (gptr() >= egptr())
            {
                return traits_type::eof();
            }
        }
    
        return sgetc();
    }
    
    ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::overflow(int_type c)
    {
        if (m_isInput)
        {
            return traits_type::eof();
        }
    
        m_buffer += traits_type::to_char_type(c);
        return traits_type::not_eof(c);
    }
    

    The usage then is as follows:

    template<typename StreamT>
    inline void FixStdStream(DWORD handleId, bool isInput, StreamT& stream)
    {
        if (::GetFileType(::GetStdHandle(handleId)) == FILE_TYPE_CHAR)
        {
            stream.rdbuf(new ConsoleStreamBufWin32(handleId, isInput));
        }
    }
    
    // ...
    
    int main()
    {
        FixStdStream(STD_INPUT_HANDLE, true, std::cin);
        FixStdStream(STD_OUTPUT_HANDLE, false, std::cout);
        FixStdStream(STD_ERROR_HANDLE, false, std::cerr);
    
        // ...
    
        std::cout << "\xc3\xbc" << std::endl;
    
        // ...
    }
    

    Left out wstring_to_utf8 and utf8_to_wstring could easily be implemented with WideCharToMultiByte and MultiByteToWideChar WinAPI functions.

    0 讨论(0)
  • 2020-12-10 06:21

    Oi. Congratulations on finding a way to change the code page of the console from inside your program. I didn't know about that call, I always had to use chcp.

    I'm guessing the C++ default locale is getting involved. By default it will use the code page provide by GetThreadLocale() to determine the text encoding of non-wstring stuff. This generally defaults to CP1252. You could try using SetThreadLocale() to get to UTF-8 (if it even does that, can't recall), with the hope that std::locale defaults to something that can handle your UTF-8 encoding.

    0 讨论(0)
提交回复
热议问题