Chinese character in source code when UTF-8 settings can't be used [duplicate]

问题

This is the scenario:

I can only use the char* data type for the string, not wchar_t *
My MS Visual C++ compiler has to be set to MBCS, not UNICODE because the third party source code that I have is using MBCS; Setting it to UNICODE will cause data type issues.
I am trying to print chinese characters on a printer which needs to get a character string so it can print correctly

What should I do with this line to make the code correct: char * str = "你好";

Convert it to hex sequence perhaps? If yes, how? Thanks a lot.

char * str = "你好";
size_t len = strlen(str) + 1;


wchar_t * wstr = new wchar_t[len];
size_t convertedSize  = 0;
mbstowcs_s(&convertedSize, wstr, len, str, _TRUNCATE);
cout << convertedSize;

if(! ExtTextOutW(resource->dc, 1,1 , ETO_OPAQUE, NULL, wstr ,  convertedSize, NULL))
{
  return 0;
}

UPDATE : Let's put the question in another way

I have this, the char * str contain sequence of UTF-8 code units, for the 2 chinese character 你好， the ExtTextOutW still cannot execute the wstr correctly, because I think the my code for mbstowcs_s could still not working correctly. Any idea why ?

char * str = "\xE4\xBD\xA0\xE5\xA5\xBD";    
    size_t len = strlen(str) + 1;
    wchar_t * wstr = new wchar_t[len];
    size_t convertedSize  = 0;
    mbstowcs_s(&convertedSize, wstr, len, str, _TRUNCATE);
    if(! ExtTextOutW(resource->dc, 1,1 , ETO_OPAQUE, NULL,  wstr ,  len, NULL))
    {
        return 0;
    }

回答1:

The fact is, 你好 is a sequence of Unicode characters. You will need to use a Unicode character set in order to ensure that it will be displayed correctly.

The only possible exception to that is if you're using a multi-byte character set that includes both of these characters in the basic character set. Since you say that you're stuck compiling for the MBCS anyway, that might be a solution. In order to make it work, you will have to set the system language to one that includes this character. The exact way you do this changes in each OS version. I think they're trying to "improve" it. On Windows 7, at least, they call this the "Language for non-Unicode programs" setting, accessible in the "Regions and Language" control panel.

If there is no system language in which these characters are provided as part of the basic character set, then you are basically out of luck.

Even if you tried to use a UTF-8 encoding (which Windows does not natively support, instead preferring UTF-16 for its Unicode support), which uses the char data type, it is very likely that whatever other application/library you're interfacing with would not be able to deal with it. Windows applications assume that a char holds a character in the current ANSI/MB character set. Unicode characters are in a wchar_t, and since you can't use that, it indicates the application simply doesn't support Unicode. (That means it's broken, by the way—time to upgrade.)

回答2:

As an adaptation from what MYMNeo said, I would suggest that this would work:

wchar_t *str = L"你好";
fputws(str, stdout);

ps. This probably isn't C: cout << convertedSize;.

来源：https://stackoverflow.com/questions/15962259/chinese-character-in-source-code-when-utf-8-settings-cant-be-used

标签

c++

visual-c++

character-encoding

cjk

multibyte