Do UTF-8, UTF-16, and UTF-32 differ in the number of characters they can store?

后端 未结 6 1910
慢半拍i
慢半拍i 2020-12-01 02:55

Okay. I know this looks like the typical \"Why didn\'t he just Google it or go to www.unicode.org and look it up?\" question, but for such a simple question the ans

6条回答
  •  旧巷少年郎
    2020-12-01 03:19

    All of the UTF-8/16/32 encodings can map all Unicode characters. See Wikipedia's Comparison of Unicode Encodings.

    This IBM article Encode your XML documents in UTF-8 is very helpful, and indicates if you have the choice, it's better to choose UTF-8. Mainly the reasons are wide tool support, and UTF-8 can usually pass through systems that are unaware of unicode.

    From the section What the specs say in the IBM article:

    Both the W3C and the IETF have recently become more adamant about choosing UTF-8 first, last, and sometimes only. The W3C Character Model for the World Wide Web 1.0: Fundamentals states, "When a unique character encoding is required, the character encoding MUST be UTF-8, UTF-16 or UTF-32. US-ASCII is upwards-compatible with UTF-8 (an US-ASCII string is also a UTF-8 string, see [RFC 3629]), and UTF-8 is therefore appropriate if compatibility with US-ASCII is desired." In practice, compatibility with US-ASCII is so useful it's almost a requirement. The W3C wisely explains, "In other situations, such as for APIs, UTF-16 or UTF-32 may be more appropriate. Possible reasons for choosing one of these include efficiency of internal processing and interoperability with other processes."

提交回复
热议问题