Is there any reason to prefer UTF-16 over UTF-8?

后端 未结 7 1638
野性不改
野性不改 2020-12-25 11:39

Examining the attributes of UTF-16 and UTF-8, I can\'t find any reason to prefer UTF-16.

However, checking out Java and C#, it looks like strings and chars there def

7条回答
  •  眼角桃花
    2020-12-25 12:31

    UTF-16 can be more efficient for representing characters in some languages such as Chinese, Japanese and Korean where most characters can be represented in one 16 bit word. Some rarely used characters may require two 16 bit words. UTF-8 is generally much more efficient for representing characters from Western European character sets - UTF-8 and ASCII are equivalent over the ASCII range (0-127) - but less efficient with Asian languages, requiring three or four bytes to represent characters that can be represented with two bytes in UTF-16.

    UTF-16 has an advantage as an in-memory format for Java/C# in that every character in the Basic Multilingual Plane can be represented in 16 bits (see Joe's answer) and some of the disadvantages of UTF-16 (e.g. confusing code relying on \0 terminators) are less relevant.

提交回复
热议问题