Encode/Decode std::string to UTF-16

前端 未结 3 1409
难免孤独
难免孤独 2020-12-09 20:20

I have to handle a file format (both read from and write to it) in which strings are encoded in UTF-16 (2 bytes per character). Since characters out of the ASCII table are r

相关标签:
3条回答
  • 2020-12-09 21:03

    I would suggest having a look at:

    Convert C++ std::string to UTF-16-LE encoded string

    And check out the iconv function. It's a C library, no requirements for C++11.

    There's also a Win32 specific iconv library at https://github.com/win-iconv/win-iconv.

    0 讨论(0)
  • 2020-12-09 21:06

    Did you look at Boost.Locale? This page, in particular, describes how to do UTF to UTF conversions and how to integrate it with IOStreams.

    0 讨论(0)
  • 2020-12-09 21:15

    C++11 has this functionality:

    std::string s = u8"Hello, World!";
    
    // #include <codecvt>
    std::wstring_convert<std::codecvt<char16_t,char,std::mbstate_t>,char16_t> convert;
    
    std::u16string u16 = convert.from_bytes(s);
    std::string u8 = convert.to_bytes(u16);
    

    However to my knowledge the only implementation that has this so far is libc++. C++11 also has std::codecvt_utf8_utf16<char16_t> which some other implementations have. Specifically, codecvt_utf8_utf16 works in VS 2010 and above, and since wchar_t is used by Windows to represent UTF-16 you can use this to convert between UTF-8 and Windows' native encoding.


    The specialization codecvt<char16_t, char, mbstate_t> converts between the UTF-16 and UTF-8 encoding schemes, and the specialization codecvt<char32_t, char, mbstate_t> converts between the UTF-32 and UTF-8 encoding schemes.

                                                                                                                             — [locale.codecvt] 22.4.1.4/3


    Oh, and std::codecvt specializations have protected destructors, and wstring_convert requires access to the destructor so you really need an adapter:

    template <class Facet>
    class usable_facet : public Facet {
    public:
        using Facet::Facet; // inherit constructors
        ~usable_facet() {}
    
        // workaround for compilers without inheriting constructors:
        // template <class ...Args> usable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
    };
    
    template<typename internT, typename externT, typename stateT> 
    using codecvt = usable_facet<std::codecvt<internT, externT, stateT>>;
    
    std::wstring_convert<codecvt<char16_t,char,std::mbstate_t>> convert;
    
    0 讨论(0)
提交回复
热议问题