发表新帖

发表新帖

Is there any reason to prefer UTF-16 over UTF-8?

后端未结

关注

 7  1640

Examining the attributes of UTF-16 and UTF-8, I can\'t find any reason to prefer UTF-16.

However, checking out Java and C#, it looks like strings and chars there def

相关标签:

7条回答

猫巷女王i

2020-12-25 12:32
I imagine C# using UTF-16 derives from the Windows NT family of operating systems using UTF-16 internally.

I imagine there are two main reasons why Windows NT uses UTF-16 internally:
- For memory usage: UTF-32 wastes a lot of space to encode.
- For performance: UTF-8 is much harder to decode than UTF-16. In UTF-16 characters are either a Basic Multilingual Plane character (2 bytes) or a Surrogate Pair (4 bytes). UTF-8 characters can be anywhere between 1 and 4 bytes.
Contrary to what other people have answered - you cannot treat UTF-16 as UCS-2. If you want to correctly iterate over actual characters in a string, you have to use unicode-friendly iteration functions. For example in C# you need to use StringInfo.GetTextElementEnumerator().

For further information, this page on the wiki is worth reading: http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题