Getting the string length on UTF-8 in C? [closed]

拈花ヽ惹草 提交于 2020-01-07 07:56:11

问题


Can this be done using a method similar to this one:

As long as the current element of the string the user input via scanf is not \0, add one to the "length" int and then print out the length.

I would be very grateful if anybody could guide me through the least complex way possible as I am a beginner.

Thank you very much, have a good one!


回答1:


What do you mean by string length?

The number of bytes is easily obtained with strlen(s).

The number of code points encoded in UTF-8 can be computed by counting the number of single byte chars (range 1 to 127) and the number of leading bytes (range 0xC0 to 0xFF), ignoring continuation bytes (range 0x80 to 0xBF) and stopping at '\0'.

Here is a simple function to do this:

size_t count_utf8_code_points(const char *s) {
    size_t count = 0;
    while (*s) {
        count += (*s++ & 0xC0) != 0x80;
    }
    return count;
}

This function assumes that the contents of the array pointed to by s is properly encoded.

Also note that this will compute the number of code points, not the number of characters displayed, as some of these may be encoded using multiple combining code points, such as <LATIN CAPITAL LETTER A> followed by <COMBINING ACUTE ACCENT>.



来源:https://stackoverflow.com/questions/32936646/getting-the-string-length-on-utf-8-in-c

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!