char vs wchar_t when to use which data type

后端 未结 3 1304
梦谈多话
梦谈多话 2020-12-14 19:31

I want to understand the difference between char and wchar_t ? I understand that wchar_t uses more bytes but can I get a clear cut exa

3条回答
  •  忘掉有多难
    2020-12-14 19:46

    Short anwser:

    You should never use wchar_t in modern C++, except when interacting with OS-specific APIs (basically use wchar_t only to call Windows API functions).

    Long answer:

    Design of standard C++ library implies there is only one way to handle Unicode - by storing UTF-8 encoded strings in char arrays, as almost all functions exist only in char variants (think of std::exception::what).

    In a C++ program you have two locales:

    • Standard C library locale set by std::setlocale
    • Standard C++ library locale set by std::locale::global

    Unfortunately, none of them defines behavior of standard functions that open files (like std::fopen, std::fstream::open etc). Behavior differs between OSes:

    • Linux is encoding agnostic, so those function simply pass char string to underlying system call
    • On Windows char string is converted to wide string using user specific locale before system call is made

    Everything usually works fine on Linux as everyone uses UTF-8 based locales so all user input and arguments passed to main functions will be UTF-8 encoded. But you might still need to switch current locales to UTF-8 variants explicitly as by default C++ program starts using default "C" locale. At this point, if you only care about Linux and don't need to support Windows, you can use char arrays and std::string assuming it is UTF-8 sequences and everything "just works".

    Problems appear when you want to support Windows, as there you always have additional 3rd locale: the one set for the current user which can be configured somewhere in "Control Panel". The main issue is that this locale is never a unicode locale, so it is impossible to use functions like std::fopen(const char *) and std::fstream::open(const char *) to open a file using Unicode path. On Windows you will have to use custom wrappers that use non-standard Windows specific functions like _wfopen, std::fstream::open(const wchar_t *) on Windows. You can check Boost.Nowide (not yet included in Boost) to see how this can be done: http://cppcms.com/files/nowide/html/

    With C++17 you can use std::filesystem::path to store file path in a portable way, but it is still broken on Windows:

    • Implicit constructor std::filesystem::path::path(const char *) uses user-specific locale on MSVC and there is no way to make it use UTF-8. Function std::filesystem::u8string should be used to construct path from UTF-8 string, but it is too easy to forget about this and use implicit construct instead.
    • std::error_category::message(int) for both error categories returns error description using user-specific encoding.

    So what we have on Windows is:

    • Standard library functions that open files are broken and should never be used.
    • Arguments passed to main(int, char**) are broken and should never be used.
    • WinAPI functions ending with *A and macros are broken and should never be used.
    • std::filesystem::path is partially broken and should never be used directly.
    • Error categories returned by std::generic_category and std::system_category are broken and should never be used.

    If you need long term solution for a non-trivial project, I would recommend:

    • Using Boost.Nowide or implementing similar functionality directly - this fixed broken standard library.
    • Re-implementing standard error categories returned by std::generic_category and std::system_category so that they would always return UTF-8 encoded strings.
    • Wrapping std::filesystem::path so that new class would always use UTF-8 when converting path to string and string to path.
    • Wrapping all required functions from std::filesystem so that they would use your path wrapper and your error categories.

    Unfortunately, this won't fix issues with other libraries that work with files, but 99% of them are broken anyway (do not support unicode).

    Such is life of a C++ programmer. Microsoft could fix this by allowing us to switch Windows runtime to UTF-8 based locale, but they won't because of backward compatibility.

    You can check this link for further explanation: http://utf8everywhere.org/

提交回复
热议问题