Handling special characters in C (UTF-8 encoding)

后端 未结 4 1475
天涯浪人
天涯浪人 2020-12-07 21:17

I\'m writing a small application in C that reads a simple text file and then outputs the lines one by one. The problem is that the text file contains special characters like

4条回答
  •  生来不讨喜
    2020-12-07 21:55

    First things first:

    1. Read in the buffer
    2. Use libiconv or similar to obtain wchar_t type from UTF-8 and use the wide character handling functions such as wprintf()
    3. Use the wide character functions in C! Most file/output handling functions have a wide-character variant

    Ensure that your terminal can handle UTF-8 output. Having the correct locale setup and manipulating the locale data can automate alot of the file opening and conversion for you ... depending on what you are doing.

    Remember that the width of a code-point or character in UTF-8 is variable. This means you can't just seek to a byte and begin reading like with ASCII ... because you might land in the middle of a code point. Good libraries can do this in some cases.

    Here is some code (not mine) that demonstrates some usage of UTF-8 file reading and wide character handling in C.

    #include 
    #include 
    int main()
    {
        FILE *f = fopen("data.txt", "r, ccs=UTF-8");
        if (!f)
            return 1;
    
        for (wint_t c; (c = fgetwc(f)) != WEOF;)
            printf("%04X\n", c);
    
        fclose(f);
        return 0;
    }
    

    Links

    1. libiconv
    2. Locale data in C/GNU libc
    3. Some handy info
    4. Another good Unicode/UTF-8 in C resource

提交回复
热议问题