C: Most efficient way to determine how many bytes will be needed for a UTF-16 string from a UTF-8 string

前端 未结 3 1168
粉色の甜心
粉色の甜心 2020-12-18 03:41

I\'ve seen some very clever code out there for converting between Unicode codepoints and UTF-8 so I was wondering if anybody has (or would enjoy devising) this.

3条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-18 04:32

    Very simple: count the number of head bytes, double-counting bytes F0 and up.

    In code:

    size_t count(unsigned char *s)
    {
        size_t l;
        for (l=0; *s; s++) l+=(*s-0x80U>=0x40)+(*s>=0xf0);
        return l;
    }
    

    Note: This function returns the length in UTF-16 code units. If you want the number of bytes needed, multiply by 2. If you're going to store a null terminator you'll also need to account for space for that (one extra code unit/two extra bytes).

提交回复
热议问题