How to use UTF-8 in C code?

后端 未结 5 978
耶瑟儿~
耶瑟儿~ 2021-02-04 03:06

My setup: gcc-4.9.2, UTF-8 environment.

The following C-program works in ASCII, but does not in UTF-8.

Create input file:

echo -n \'привет мир\'          


        
5条回答
  •  甜味超标
    2021-02-04 03:59

    #define SIZE 10
    

    The buffer size of 10 is insufficient to store the UTF-8 string привет мир. Try changing it to a larger value. On my system (Ubuntu 12.04, gcc 4.8.1), changing it to 20, worked perfectly.

    UTF-8 is a multibyte encoding which uses between 1 and 4 bytes per character. So, it is safer to use 40 as the buffer size above. There is a big discussion at How many bytes does one Unicode character take? which might be interesting.

提交回复
热议问题