How many characters can UTF-8 encode?

前端 未结 10 1353
一个人的身影
一个人的身影 2020-11-28 01:55

If UTF-8 is 8 bits, does it not mean that there can be only maximum of 256 different characters?

The first 128 code points are the same as in ASCII. But it says UTF-

10条回答
  •  感情败类
    2020-11-28 02:32

    According to this table* UTF-8 should support:

    231 = 2,147,483,648 characters

    However, RFC 3629 restricted the possible values, so now we're capped at 4 bytes, which gives us

    221 = 2,097,152 characters

    Note that a good chunk of those characters are "reserved" for custom use, which is actually pretty handy for icon-fonts.

    * Wikipedia used show a table with 6 bytes -- they've since updated the article.

    2017-07-11: Corrected for double-counting the same code point encoded with multiple bytes

提交回复
热议问题