C standard : Character set and string encoding specification

后端 未结 2 1730
难免孤独
难免孤独 2021-01-04 08:58

I found the C standard (C99 and C11) vague with respect to character/string code positions and encoding rules:

Firstly the standard defines the source characte

相关标签:
2条回答
  • 2021-01-04 09:37

    C is not greedy about character sets. There's no such thing as "default character set", it's implementation defined - although it's mostly ASCII or UTF-8 on most modern systems.

    0 讨论(0)
  • 2021-01-04 09:48

    The standard doesn't specify a default encoding because existing practice already had C implemented on machines with lots of different encodings, for example Honeywell mainframes and IBM mainframes.

    I would expect gcc to take its default from the locale currently specified by LC_CHARSET, but I've never tested it.

    VC++ takes its default from a Control Panel setting. That default Control Panel setting varies according to which country Windows was purchased in, and most users never change it, but they can change it while installing Windows can change it later.

    Trigraphs were invented so that a source program could be copied from an environment with one locale to an environment with a slightly different locale and still be compiled. For example if a Windows user in China uses trigraphs then a Windows user in Greece would be able to compile the same source program. However, if the locales differ too much, for example one using EBCDIC and one using EUC, trigraphs won't suffice.

    0 讨论(0)
提交回复
热议问题