How many characters can be mapped with Unicode?

后端 未结 6 757
感情败类
感情败类 2020-11-27 02:48

I am asking for the count of all the possible valid combinations in Unicode with explanation. I know a char can be encoded as 1,2,3 or 4 bytes. I also don\'t understand why

6条回答
  •  野性不改
    2020-11-27 03:23

    To give a metaphorically accurate answer, all of them.

    Continuation bytes in the UTF-8 encodings allow for resynchronization of the encoded octet stream in the face of "line noise". The encoder, merely need scan forward for a byte that does not have a value between 0x80 and 0xBF to know that the next byte is the start of a new character point.

    In theory, the encodings used today allow for expression of characters whose Unicode character number is up to 31 bits in length. In practice, this encoding is actually implemented on services like Twitter, where the maximal length tweet can encode up to 4,340 bits' worth of data. (140 characters [valid and invalid], times 31 bits each.)

提交回复
热议问题