I\'m programming something that counts the number of UTF-8 characters in a file. I\'ve already written the base code but now, I\'m stuck in the part where the characters are su
See: https://en.wikipedia.org/wiki/UTF-8#Encoding
Each UTF-8 sequence contains one starting byte and zero or more extra bytes.
Extra bytes always start with bits 10 and first byte never starts with that sequence.
You can use that information to count only first byte in each UTF-8 sequence.
if((b&0xC0) != 0x80) {
count++;
}
Keep in mind this will break, if file contains invalid UTF-8 sequences. Also, "UTF-8 characters" might mean different things. For example "