What is a multibyte character set?

前端 未结 9 1199
时光说笑
时光说笑 2020-12-03 04:50

Does the term multibyte refer to a charset whose characters can - but don\'t have to be - wider than 1 byte, (e.g. UTF-8) or does it refer to character sets which are in any

9条回答
  •  情歌与酒
    2020-12-03 05:05

    UTF-8 is multi-byte, which means that each English character (ASCII) is stored in 1 byte while non-english character like Chinese, Thai, is stored in 3 bytes. When you mix Chinese/Thai with English, like "ทt", the first Thai character "ท" uses 3 bytes while the second English character "t" uses only 1 byte. People who designed multi-byte encoding realized that English character shouldn't be stored in 3 bytes while it can fit in 1 byte due to the waste of storage space.

    UTF-16 stores each character either English or non-English in a fixed 2 byte length so it is not multi-byte but called a wide character. It is very suitable for Chinese/Thai languages where each character fits entirely in 2 bytes but printing to utf-8 console output need a conversion from wide character to multi-byte format by using function wcstombs().

    UTF-32 stores each character in a fixed 4 byte length but nobody use it to store character due to a waste of storage space.

提交回复
热议问题