Is it possible to get GCC to compile UTF-8 with BOM source files?

眉间皱痕 提交于 2019-11-28 11:00:29
Adrian Cox

According to the GCC Wiki, this isn't supported yet. You can use -fextended-identifiers and pre-process your code to convert the identifiers to UCN. From the linked page:

perl -pe 'BEGIN { binmode STDIN, ":utf8"; } s/(.)/ord($1) < 128 ? $1 : sprintf("\\U%08x", ord($1))/ge;' 

See also g++ unicode variable name and Unicode Identifiers and Source Code in C++11?

While unicode identifiers are supported in gcc, UTF-8 input is not. Therefore, unicode identifiers have to be encoded using \uXXXX and \UXXXXXXXX escape codes. However, a simple one-line patch to the cpp preprocessor allows gcc and g++ to process UTF-8 input provided a recent version of iconv that support C99 conversions is also installed. Details are present at

https://www.raspberrypi.org/forums/viewtopic.php?p=802657

However, the patch is so simple it can be given right here.

diff -cNr gcc-5.2.0/libcpp/charset.c gcc-5.2.0-ejo/libcpp/charset.c
*** gcc-5.2.0/libcpp/charset.c  Mon Jan  5 04:33:28 2015
--- gcc-5.2.0-ejo/libcpp/charset.c  Wed Aug 12 14:34:23 2015
***************
*** 1711,1717 ****
    struct _cpp_strbuf to;
    unsigned char *buffer;

!   input_cset = init_iconv_desc (pfile, SOURCE_CHARSET, input_charset);
    if (input_cset.func == convert_no_conversion)
      {
        to.text = input;
--- 1711,1717 ----
    struct _cpp_strbuf to;
    unsigned char *buffer;

!   input_cset = init_iconv_desc (pfile, "C99", input_charset);
    if (input_cset.func == convert_no_conversion)
      {
        to.text = input;

Even with the patch, two command line options are needed to enable UTF-8 input. In particular, try something like

$ /usr/local/gcc-5.2/bin/gcc \
    -finput-charset=UTF-8 -fextended-identifiers \
    -o circle circle.c
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!