How does a file with Chinese characters know how many bytes to use per character?

前端未结

关注

 9  1648

误落风尘 2020-12-13 05:05

I have read Joel\'s article \"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)\" but still don\'

9条回答

情话喂你 (楼主)

2020-12-13 05:18

The hint is in this sentence here:

In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes.

Every code point up to 127 has the top bit set to zero. Therefore, the editor knows that if it encounters a byte where the top bit is a 1, it is the start of a multi-byte character.

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...