When to use the plain char type in C

问题

In plain C, by the standard there are three distinct "character" types:

plain char which one's signedness is implementation defined.
signed char.
unsigned char.

Let's assume at least C99, where stdint.h is already present (so you have the int8_t and uint8_t types as recommendable alternatives with explicit width to signed and unsigned chars).

For now for me it seems like using the plain char type is only really useful (or necessary) if you need to interface functions of the standard library such as printf, and in all other scenarios, rather to be avoided. Using char could lead to undefined behavior when it is signed on the implementation, and for any reason you need to do any arithmetic on such data.

The problem of using an appropriate type is probably the most apparent when dealing for example with Unicode text (or any code page using values above 127 to represent characters), which otherwise could be handled as a plain C string. However the relevant string.h functions all accept char, and if such data is typed char, that imposes problems when trying to interpret it for example for a display routine capable to handle its encoding.

What is the most recommendable method in such a case? Are there any particular reasons beyond this where it could be recommendable to use char over stdint.h's appropriate fixed-width types?

回答1:

The char type is for characters and strings. It is the type expected and returned by all the string handling functions. (*) You really should never have to do arithmetic on char, especially not the kind where signed-ness would make a difference.

unsigned char is the type to be used for raw data. For example memcpy() or fread() interpret their void * arguments as arrays of unsigned char. The standard guarantees that any type can be also represented as an array of unsigned char. Any other conversion might be "signalling", i.e. triggering exceptions. (ISO/IEC 9899:2011, section 6.2.6 "Representation of Types"). (**)

signed char is when you need a signed integer of char size (for arithmetics).

(*): The character handling functions in <ctype.h> are a bit oddball about this, as they cater for EOF (negative), and hence "force" the character values into the unsigned char range (ISO/IEC 9899:2011, section 7.4 Character handling). But since it is guaranteed that a char can be cast to unsigned char and back without loss of information as per section 6.2.6... you get the idea.

When signed-ness of char would make a difference -- the comparison functions like in strcmp() -- the standard dictates that char is interpreted as unsigned char (ISO/IEC 9899:2011, section 7.24.4 Comparison functions).

(**): Practically, it is hard to see how a conversion of raw data to char and back could be signalling where the same done with unsigned char would not be signalling. But unsigned char is what the section of the standard says. ;-)

回答2:

Use char to store characters (standard defines the behaviour for basic execution character set elements only, roughly ASCII 7-bit characters).

Use signed char or unsigned char to get the corresponding arithmetic (signed or unsigned arithmetic have different properties for integers - char is an integer type).

This doesn't means that you can't make arithmetic with raw chars, as stated:

6.2.5 Types - 3. An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative.

Then if you only use character set elements arithmetic on them is correctly defined.

来源：https://stackoverflow.com/questions/48091302/when-to-use-the-plain-char-type-in-c

标签

string

c99