Does the multibyte-to-wide-string conversion function “mbstowcs”, when passed a string literal, use the encoding of the source file?

前端未结

关注

 2  1692

轮回少年 2020-12-17 03:13

ADDENDUM A tentative answer of my own appears at the bottom of the question.

I am converting an archaic VC6 C++/MFC project to VS2013 and Unic

2条回答

盖世英雄少女心 (楼主)

2020-12-17 03:38
The encoding of the source code file doesn't affect the behavior of mbstowcs. After all, the internal implementation of the function is unaware of what source code might be calling it.

On the MSDN documentation you linked is:

mbstowcs uses the current locale for any locale-dependent behavior; _mbstowcs_l is identical except that it uses the locale passed in instead. For more information, see Locale.

That linked page about locales then references setlocale which is how the behavior of mbstowcs can be affected.

Now, taking a look at your proposed way of passing UTF-8:
```
mbstowcs (dest, u8"Hello, world!", 1024);
```
Unfortunately, that isn't going to work properly as far as I know once you use interesting data. If it even compiles, it only does do because the compiler would have to be treating u8 the same as a char*. And as far as mbstowcs is concerned, it will believe the string is encoded under whatever the locale is set for.

Even more unfortunately, I don't believe there's any way (on the Windows / Visual Studio platform) to set a locale such that UTF-8 would be used.

So that would happen to work for ASCII characters (the first 128 characters) only because they happen to have the exact same binary values in various ANSI encodings as well as UTF-8. If you try with any characters beyond that (for instance anything with an accent or umlaut) then you'll see problems.

Personally, I think mbstowcs and such are rather limited and clunky. I've found the Window's API function MultiByteToWideChar to be more effective in general. In particular it can easily handle UTF-8 just by passing CP_UTF8 for the code page parameter.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...