Is there any reason to prefer UTF-16 over UTF-8?

后端 未结 7 1630
野性不改
野性不改 2020-12-25 11:39

Examining the attributes of UTF-16 and UTF-8, I can\'t find any reason to prefer UTF-16.

However, checking out Java and C#, it looks like strings and chars there def

相关标签:
7条回答
  • 2020-12-25 12:32

    I imagine C# using UTF-16 derives from the Windows NT family of operating systems using UTF-16 internally.

    I imagine there are two main reasons why Windows NT uses UTF-16 internally:

    • For memory usage: UTF-32 wastes a lot of space to encode.
    • For performance: UTF-8 is much harder to decode than UTF-16. In UTF-16 characters are either a Basic Multilingual Plane character (2 bytes) or a Surrogate Pair (4 bytes). UTF-8 characters can be anywhere between 1 and 4 bytes.

    Contrary to what other people have answered - you cannot treat UTF-16 as UCS-2. If you want to correctly iterate over actual characters in a string, you have to use unicode-friendly iteration functions. For example in C# you need to use StringInfo.GetTextElementEnumerator().

    For further information, this page on the wiki is worth reading: http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings

    0 讨论(0)
提交回复
热议问题