Here are some excerpts from my copy of the 2014 draft standard N4140
22.5 Standard code conversion facets [locale.stdcvt]
3 F
The first interpretation is conditionally true.
If __STDC_ISO_10646__
macro (imported from C) is defined, then wchar_t
is a superset of some version of Unicode.
__STDC_ISO_10646__
An integer literal of the formyyyymmL
(for example,199712L
). If this symbol is defined, then every character in the Unicode required set, when stored in an object of typewchar_t
, has the same value as the short identifier of that character. The Unicode required set consists of all the characters that are defined by ISO/IEC 10646, along with all amendments and technical corrigenda as of the specified year and month.
It appears that if the macro is defined, some kind of UCS4 can be assumed. (Not UCS2 as ISO 10646 never had a 16-bit version; the first release of ISO 10646 corresponds to Unicode 2.0).
So if the macro is defined, then
codecvt_utf8
is compatible with this native encodingNone of these things are required to hold if the macro is not defined.
There are also __STDC_UTF_16__
and __STDC_UTF_32__
but the C++ standard doesn't say what they mean. The C standard says that they signify UTF-16 and UTF-32 encodings for char16_t
and char32_t
respectively, but in C++ these encodings are always used.
Incidentally, the functions mbrtoc32
and c32rtomb
convert back and forth between char
sequences and char32_t
sequences. In C they only use UTF-32 if __STDC_UTF_32__
is defined, but in C++ UTF-32 is always used for char32_t
. So it would appear than even if __STDC_ISO_10646__
is not defined, it should be possible to convert between UTF-8 and wchar_t
by going from UTF-8 to UTF-32-encoded char32_t
to natively encoded char
to natively encoded wchar_t
, but I'm afraid of this complex stuff.