UTF-8-compliant IOstreams

痞子三分冷 提交于 2019-11-28 10:09:30

问题


Does GCC's standard library or Boost or any other library implement iostream-compliant versions of ifstream or ofstream that supports conversion between UTF-8-encoded (file-) streams and a std::vector<wchar_t> or std::wstring?


回答1:


Your question doesn't quite work. UTF-8 is a specific encoding, while wchar_t is a data type. Moreover, wchar_t is intended by the standard to represent the system's character set, but this is entirely left to platform, and the standard makes no requirements.

Therefore, the correct thing to ask for is first of all conversion between the system's narrow, multibyte encoding and the fixed-length encoding of the system's encoding into a wide string. This functionality is provided by std::mbstowcs and std::wcstombs. There may also be a locale facet somewhere that wraps this, but that's a bit of a niche area of the library.

If you want to convert between the opaque "system's encoding" prescribed by the standard and a definite encoding prescribed by your serialized data source/sink, you need an extra library. I'd recommend Posix's iconv(), which is widely available. (The Windows API has a different approach and offers special functions for conversion.)

C++11 alleviates the issue slightly by adding an explicit family of UTF-encoded string types and literals, and presumably also transcoding facilities among those (though I've never seen them implemented by anyone).

Here's my standard response of past posts on the subject: Q1, Q2, Q3. C++11 will be a joy once its fully available :-)




回答2:


The C++11 solution is to wrap the UTF-8 stream in an appropriate wbuffer_convert

#include <fstream>
#include <string>
#include <codecvt>
int main()
{
    std::ifstream utf8file("test.txt"); // if the file holds UTF-8 data
    std::wbuffer_convert<std::codecvt_utf8<wchar_t>> conv(utf8file.rdbuf());
    std::wistream ucsbuf(&conv);
    std::wstring line;
    getline(ucsbuf, line); // then line holds UCS2 or UCS4, depending on the OS
}

This works with Visual Studio 2010 and with clang++/libc++, but, unfortunately, not with GCC.

Until this becomes widespread, third-party libraries are indeed the best solution.



来源:https://stackoverflow.com/questions/7889032/utf-8-compliant-iostreams

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!