Storing unicode UTF-8 string in std::string

这一生的挚爱 提交于 2019-11-28 21:33:33

If you were using C++11 then this would be easy:

std::string msg = u8"महसुस";

But since you are not, you can use escape sequences and not rely on the source file's charset to manage the encoding for you, this way your code is more portable (in case you accidentally save it in a non-UTF8 format):

std::string msg = "\xE0\xA4\xAE\xE0\xA4\xB9\xE0\xA4\xB8\xE0\xA5\x81\xE0\xA4\xB8"; // "महसुस"

Otherwise, you might consider doing a conversion at runtime instead:

std::string toUtf8(const std::wstring &str)
{
    std::string ret;
    int len = WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0, NULL, NULL);
    if (len > 0)
    {
        ret.resize(len);
        WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), &ret[0], len, NULL, NULL);
    }
    return ret;
}

std::string msg = toUtf8(L"महसुस");

You can write msg.c_str(), s8 in the Watches window to see the UTF-8 string correctly.

If you have C++11, you can write u8"महसुस". Otherwise, you'll have to write the actual byte sequence, using \xxx for each byte in the UTF-8 sequence.

Typically, you're better off reading such text from a configuration file.

There is a way to display the right values thanks to the ‘s8′ format specifier. If we append ‘,s8′ to the variable names, Visual Studio reparses the text in UTF-8 and renders the text correctly:

In case, you are using Microsoft Visual Studio 2008 Service Pack 1, you need to apply hotfix

http://support.microsoft.com/kb/980263

If you set the system locale to English, and the file is in UTF-8 without BOM, VC will let you store the string as-is. I have written an article about this here.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!