Difference between MBCS and UTF-8 on Windows

后端 未结 4 1106
醉话见心
醉话见心 2020-11-28 03:23

I am reading about the charater set and encodings on Windows. I noticed that there are two compiler flags in Visual Studio compiler (for C++) called MBCS and UNICODE. What i

4条回答
  •  不知归路
    2020-11-28 04:04

    MBCS means Multi-Byte Character Set and describes any character set where a character is encoded into (possibly) more than 1 byte.

    The ANSI / ASCII character sets are not multi-byte.

    UTF-8, however, is a multi-byte encoding. It encodes any Unicode character as a sequence of 1, 2, 3, or 4 octets (bytes).

    However, UTF-8 is only one out of several possible concrete encodings of the Unicode character set. Notably, UTF-16 is another, and happens to be the encoding used by Windows / .NET (IIRC). Here's the difference between UTF-8 and UTF-16:

    • UTF-8 encodes any Unicode character as a sequence of 1, 2, 3, or 4 bytes.

    • UTF-16 encodes most Unicode characters as 2 bytes, and some as 4 bytes.

    It is therefore not correct that Unicode is a 16-bit character encoding. It's rather something like a 21-bit encoding (or even more these days), as it encompasses a character set with code points U+000000 up to U+10FFFF.

提交回复
热议问题