问题
What is the difference between UTF-32 and UCS-4 ? Isn't UTF-32 supposed to be a fixed-width encoding ?
回答1:
UTF-32
has started as a subset of UCS-4
. Now it is identical except that the UTF-32 standard has additional Unicode semantics. See details on wikipedia:
The original ISO 10646 standard defines a 31-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.
Because only 17 planes are actually in use, all current code points are between 0 and 0x10FFFF. UTF-32 is a subset of UCS-4 that uses only this range. Since the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, UTF-32 will be able to represent all Unicode characters. Accordingly, UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics.
However, I am not exactly sure, what additional Unicode semantics
means. Maybe someone can provide a better answer.
回答2:
The Unicode Standard Version 8.0, Appendix C states:
UCS-4 stands for “Universal Character Set coded in 4 octets.” It is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in 10646.
来源:https://stackoverflow.com/questions/30186631/what-is-the-difference-between-utf-32-and-ucs-4