UTF-8 Character Count

后端未结

关注

 4  530

时光说笑 2021-01-23 10:17

I\'m programming something that counts the number of UTF-8 characters in a file. I\'ve already written the base code but now, I\'m stuck in the part where the characters are su

4条回答

情深已故 (楼主)

2021-01-23 10:47
See: https://en.wikipedia.org/wiki/UTF-8#Encoding

Each UTF-8 sequence contains one starting byte and zero or more extra bytes. Extra bytes always start with bits 10 and first byte never starts with that sequence. You can use that information to count only first byte in each UTF-8 sequence.
```
    if((b&0xC0) != 0x80) {
        count++;
    }
```
Keep in mind this will break, if file contains invalid UTF-8 sequences. Also, "UTF-8 characters" might mean different things. For example "
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...