Convert UTF-16 to UTF-8

橙三吉。 提交于 2019-12-04 15:02:18

The problem is you specified the WC_ERR_INVALID_CHARS flag:

Windows Vista and later: Fail if an invalid input character is encountered. If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION. Note that this flag only applies when CodePage is specified as CP_UTF8 or 54936 (for Windows Vista and later). It cannot be used with other code page values.

Your conversion function seems quite long. How does this one work for you?

//----------------------------------------------------------------------------
// FUNCTION: ConvertUTF16ToUTF8
// DESC: Converts Unicode UTF-16 (Windows default) text to Unicode UTF-8.
//----------------------------------------------------------------------------
CStringA ConvertUTF16ToUTF8( __in LPCWSTR pszTextUTF16 ) {
    if (pszTextUTF16 == NULL) return "";

    int utf16len = wcslen(pszTextUTF16);
    int utf8len = WideCharToMultiByte(CP_UTF8, 0, pszTextUTF16, utf16len, 
        NULL, 0, NULL, NULL );

    CArray<CHAR> buffer;
    buffer.SetSize(utf8len+1);
    buffer.SetAt(utf8len, '\0');

    WideCharToMultiByte(CP_UTF8, 0, pszTextUTF16, utf16len, 
        buffer.GetData(), utf8len, 0, 0 );

    return buffer.GetData();
}

I see you use a function called StringCchLengthW to get the required length of the output buffer. Most of the places I look recommend using the WideCharToMultiByte function itself to tell you how many CHARs it wants.

Edit:
As Rob pointed out, you can use CW2A with the CP_UTF8 code page:

CStringA str = CW2A(wStr, CP_UTF8);

While I'm editing, I can answer your second question:

How can I verify the resultant UTF-8 string is correct?

Write it to a text file, then open it in Mozilla Firefox or an equivillant program. In the View menu, you can go to Character Encoding and switch manually to UTF-8 (assuming Firefox didn't guess it correctly to begin with). Compare it with a UTF-16 document with the same text and see if there are any differences.

Rob

You can also use the ATL String Conversion Macros - to convert from UTF-16 to UTF-8 use CW2A and pass CP_UTF8 as the code page, e.g.:

CW2A utf8(buffer, CP_UTF8);
const char* data = utf8.m_psz;
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!