How to read a UTF-16 text file in C++17

不打扰是莪最后的温柔 提交于 2021-02-19 05:57:06

问题


I am very new to C++. I want to read a UTF-16 text file in C++17 in Visual Studio 2019.

I have tried several methods in the internet (including StackOverflow) but none of them worked, and some of them didn't compile (I think they only support older compilers).

I am trying to achieve this without using any 3rd party libraries.

This reads a text file, but it has some weird characters and spaces between each letter.

// open file for reading
std::wifstream istrm(filename, std::ios::binary);
if (!istrm.is_open()) {
    std::cout << "failed to open " << filename << '\n';
}
else {
    std::wstring s;
    std::getline(istrm, s);
    std::wcout << s << std::endl;
}

Then I found some solutions for this using the following libraries

#include <locale>
#include <codecvt>

// open file for reading
std::wifstream istrm(filename, std::ios::binary);
istrm.imbue(std::locale(istrm.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
if (!istrm.is_open()) {
    std::cout << "failed to open " << filename << '\n';
}
else {
    std::wstring s;
    std::getline(istrm, s);
    std::wcout << s << std::endl;
}

This time it didn't even compile, got the following errors at the std::codecvt_utf16 line:

Error C4996 'std::codecvt_utf16': warning STL4017: std::wbuffer_convert, std::wstring_convert, and the header (containing std::codecvt_mode, std::codecvt_utf8, std::codecvt_utf16, and std::codecvt_utf8_utf16) are deprecated in C++17. (The std::codecvt class template is NOT deprecated.) The C++ Standard doesn't provide equivalent non-deprecated functionality; consider using MultiByteToWideChar() and WideCharToMultiByte() from instead. You can define _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING or _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS to acknowledge that you have received this warning.

I would appreciate if someone can provide a solution for this.

Thanks in advance.


回答1:


First of all, read related questions like Does std::wstring support UTF-16 and UTF-32 on Windows? and Is 16-bit wchar_t formally valid for representing full Unicode?.

If what you want is simply read/write strings as a blob for which you already know the encoding is UTF-16, without performing any conversion or manipulation, and you are in an environment like Visual Studio 2019 on Windows for which wchar_t is intended to hold UTF-16, then you can use the C++ wide strings and streams.

Now, if you need to perform conversions, support several encodings, iterate within strings (for some definitions of iterate), or in general anything non-trivial, you are out of luck at the moment if you want to stay within C++17. The C++ Standard committee has established a working group for Unicode, so expect to see some improvements in this area in the upcoming years. For the moment, you will need to use either Win32 functions like MultiByteToWideChar and WideCharToMultiByte, or an external library like International Components for Unicode (ICU) or Boost's Locale.



来源:https://stackoverflow.com/questions/56723436/how-to-read-a-utf-16-text-file-in-c17

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!