How does a file with Chinese characters know how many bytes to use per character?

前端 未结 9 1642
误落风尘
误落风尘 2020-12-13 05:05

I have read Joel\'s article \"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)\" but still don\'

9条回答
  •  暖寄归人
    2020-12-13 05:20

    Essentially, if it begins with a 0, it's a 7 bit code point. If it begins with 10, it's a continuation of a multi-byte codepoint. Otherwise, the number of 1's tell you how many bytes this code point is encoded as.

    The first byte indicates how many bytes encode the code point.

    0xxxxxxx 7 bits of code point encoded in 1 bytes

    110xxxxx 10xxxxxx 10 bits of code point encoded in 2 bytes

    110xxxxx 10xxxxxx 10xxxxxx etc. 1110xxxx 11110xxx etc.

提交回复
热议问题