Unknown meta-character in C/C++ string literal?

这一生的挚爱 提交于 2019-12-08 21:18:36

问题


I created a new project with the following code segment:

char* strange = "(Strange??)";
cout << strange << endl;

resulting in the following output:

(Strange]

Thus translating '??)' -> ']'

Debugging it shows that my char* string literal is actually that value and it's not a stream translation. This is obviously not a meta-character sequence I've ever seen. Some sort of Unicode or wide char sequence perhaps? I don't think so however... I've tried disabling all related project settings to no avail.

Anyone have an explanation?

  • search : 'question mark, question mark, close brace' c c++ string literal

回答1:


What you're seeing is called a trigraph.

In written language by grown-ups, one question mark is sufficient for any situation. Don't use more than one at a time and you'll never see this again.

GCC ignores trigraphs by default because hardly anyone uses them intentionally. Enable them with the -trigraph option, or tell the compiler to warning you about them with the -Wtrigraphs option.

Visual C++ 2010 also disables them by default and offers /Zc:trigraphs to enable them. I can't find anything about ways to enable or disable them in prior versions.




回答2:


Easy way to avoid the trigraph surprise: split a "??" string literal in two:

char* strange = "(Strange??)";
char* strange2 = "(Strange?" "?)";
/*                         ^^^ no punctuation */

Edit
gcc has an option to warn about trigraphs: -Wtrigraphs (enabled with -Wall also)
end edit

Quotes from the Standard

    5.2.1.1 Trigraph sequences
1   Before any other processing takes place, each occurrence of one of the
    following sequences of three characters (called trigraph sequences13))
    is replaced with the corresponding single character.
           ??=      #               ??)      ]               ??!      |
           ??(      [               ??'      ^               ??>      }
           ??/      \               ??<      {               ??-      ~
    No other trigraph sequences exist. Each ? that does not begin one of
    the trigraphs listed above is not changed.
    5.1.1.2 Translation phases
1   The precedence among the syntax rules of translation is specified by
    the following phases.
         1.   Physical source file multibyte characters are mapped, in an
              implementation-defined manner, to the source character set
              (introducing new-line characters for end-of-line indicators)
              if necessary. Trigraph sequences are replaced by corresponding
              single-character internal representations.



回答3:


It's a Trigraph!




回答4:


??) is a trigraph.




回答5:


That's trigraph support. You can prevent trigraph interpretation by escaping any of the characters:

char* strange = "(Strange?\?)";



回答6:


It's a trigraph.




回答7:


Trigraphs are the reason. The talk about C in the article also applies to C++




回答8:


As mentioned several times, you're being bitten by a trigraph. See this previous SO question for more information:

  • Purpose of Trigraph sequences in C++?

You can fix the problem by using the '\?' escape sequence for the '?' character:

char* strange = "(Strange\?\?)";

In fact, this is the reason for that escape sequence, which is somewhat mysterious if you're unaware of those damn trigraphs.




回答9:


While trying to cross-compile on GCC it picked my sequence up as a trigraph:

So all I need to do now is figure out how to disable this in projects by default since I can only see it creating problems for me. (I'm using a US keyboard layout anyway)

The default behavior on GCC is to ignore but give a warning, which is much more sane and is indeed what Visual Studio 2010 will adopt as the standard as far as I know.



来源:https://stackoverflow.com/questions/1669448/unknown-meta-character-in-c-c-string-literal

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!