I\'ve seen some very clever code out there for converting between Unicode codepoints and UTF-8 so I was wondering if anybody has (or would enjoy devising) this.
Very simple: count the number of head bytes, double-counting bytes F0
and up.
In code:
size_t count(unsigned char *s)
{
size_t l;
for (l=0; *s; s++) l+=(*s-0x80U>=0x40)+(*s>=0xf0);
return l;
}
Note: This function returns the length in UTF-16 code units. If you want the number of bytes needed, multiply by 2. If you're going to store a null terminator you'll also need to account for space for that (one extra code unit/two extra bytes).