How to properly add hex escapes into a string-literal?

大城市里の小女人 提交于 2019-11-27 04:27:53

Use 3 octal digits:

char problem[] = "abc\022e";

or split your string:

char problem[] = "abc\x12" "e";

Why these work:

  • Unlike hex escapes, standard defines 3 digits as maximum amount for octal escape.

    6.4.4.4 Character constants

    ...

    octal-escape-sequence:
        \ octal-digit
        \ octal-digit octal-digit
        \ octal-digit octal-digit octal-digit
    

    ...

    hexadecimal-escape-sequence:
        \x hexadecimal-digit
        hexadecimal-escape-sequence hexadecimal-digit
    
  • String literal concatenation is defined as a later translation phase than literal escape character conversion.

    5.1.1.2 Translation phases

    ...

    1. Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation- defined member other than the null (wide) character. 8)

    2. Adjacent string literal tokens are concatenated.

Since string literals are concateneated early on in the compilation process, but after the escaped-character conversion, you can just use:

char problem[] = "abc\x12" "e";

though you may prefer full separation for readability:

char problem[] = "abc" "\x12" "e";

For the language lawyers amongst us, this is covered in C11 5.1.1.2 Translation phases (my emphasis):

  1. Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.

  2. Adjacent string literal tokens are concatenated.

Why I'm asking? When you want to build UTF-8 string as constant, you have to use hex values of character is larger than ASCII table can hold.

Well, no. You don't have to. As of C11, you can prefix your string constant with u8, which tells the compiler that the character literal is in UTF-8.

char solution[] = u8"no need to use hex-codes áé§µ";

(Same thing is supported by C++11 as well, by the way)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!