How does a file with Chinese characters know how many bytes to use per character?

前端 未结 9 1648
误落风尘
误落风尘 2020-12-13 05:05

I have read Joel\'s article \"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)\" but still don\'

9条回答
  •  情话喂你
    2020-12-13 05:18

    The hint is in this sentence here:

    In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes.

    Every code point up to 127 has the top bit set to zero. Therefore, the editor knows that if it encounters a byte where the top bit is a 1, it is the start of a multi-byte character.

提交回复
热议问题