How many characters can UTF-8 encode?

前端 未结 10 1427
一个人的身影
一个人的身影 2020-11-28 01:55

If UTF-8 is 8 bits, does it not mean that there can be only maximum of 256 different characters?

The first 128 code points are the same as in ASCII. But it says UTF-

10条回答
  •  孤街浪徒
    2020-11-28 02:24

    Unicode is firmly married to UTF-8. Unicode specifically supports 2^21 code points (2,097,152 characters) which is exactly the same number of code points supported by UTF-8. Both systems reserve the same 'dead' space and restricted zones for code points etc. ...as of June 2018 the most recent version, Unicode 11.0, contains a repertoire of 137,439 characters

    From the unicode standard. Unicode FAQ

    The Unicode Standard encodes characters in the range U+0000..U+10FFFF, which amounts to a 21-bit code space.

    From the UTF-8 Wikipedia page. UTF-8 Description

    Since the restriction of the Unicode code-space to 21-bit values in 2003, UTF-8 is defined to encode code points in one to four bytes, ...

提交回复
热议问题