C: Most efficient way to determine how many bytes will be needed for a UTF-16 string from a UTF-8 string

前端未结

关注

 3  1173

粉色の甜心 2020-12-18 03:41

I\'ve seen some very clever code out there for converting between Unicode codepoints and UTF-8 so I was wondering if anybody has (or would enjoy devising) this.

3条回答

轻奢々 (楼主)

2020-12-18 04:32
Very simple: count the number of head bytes, double-counting bytes F0 and up.

In code:
```
size_t count(unsigned char *s)
{
    size_t l;
    for (l=0; *s; s++) l+=(*s-0x80U>=0x40)+(*s>=0xf0);
    return l;
}
```
Note: This function returns the length in UTF-16 code units. If you want the number of bytes needed, multiply by 2. If you're going to store a null terminator you'll also need to account for space for that (one extra code unit/two extra bytes).
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...