问题
In plain C, by the standard there are three distinct "character" types:
- plain
char
which one's signedness is implementation defined. signed char
.unsigned char
.
Let's assume at least C99, where stdint.h
is already present (so you have the int8_t
and uint8_t
types as recommendable alternatives with explicit width to signed and unsigned chars).
For now for me it seems like using the plain char
type is only really useful (or necessary) if you need to interface functions of the standard library such as printf
, and in all other scenarios, rather to be avoided. Using char
could lead to undefined behavior when it is signed on the implementation, and for any reason you need to do any arithmetic on such data.
The problem of using an appropriate type is probably the most apparent when dealing for example with Unicode text (or any code page using values above 127 to represent characters), which otherwise could be handled as a plain C string. However the relevant string.h
functions all accept char
, and if such data is typed char
, that imposes problems when trying to interpret it for example for a display routine capable to handle its encoding.
What is the most recommendable method in such a case? Are there any particular reasons beyond this where it could be recommendable to use char
over stdint.h
's appropriate fixed-width types?
回答1:
The char
type is for characters and strings. It is the type expected and returned by all the string handling functions. (*) You really should never have to do arithmetic on char
, especially not the kind where signed-ness would make a difference.
unsigned char
is the type to be used for raw data. For example memcpy()
or fread()
interpret their void *
arguments as arrays of unsigned char
. The standard guarantees that any type can be also represented as an array of unsigned char
. Any other conversion might be "signalling", i.e. triggering exceptions. (ISO/IEC 9899:2011, section 6.2.6 "Representation of Types"). (**)
signed char
is when you need a signed integer of char
size (for arithmetics).
(*): The character handling functions in <ctype.h>
are a bit oddball about this, as they cater for EOF (negative), and hence "force" the character values into the unsigned char
range (ISO/IEC 9899:2011, section 7.4 Character handling). But since it is guaranteed that a char
can be cast to unsigned char
and back without loss of information as per section 6.2.6... you get the idea.
When signed-ness of char
would make a difference -- the comparison functions like in strcmp()
-- the standard dictates that char
is interpreted as unsigned char
(ISO/IEC 9899:2011, section 7.24.4 Comparison functions).
(**): Practically, it is hard to see how a conversion of raw data to char
and back could be signalling where the same done with unsigned char
would not be signalling. But unsigned char
is what the section of the standard says. ;-)
回答2:
Use char
to store characters (standard defines the behaviour for basic execution character set elements only, roughly ASCII 7-bit characters).
Use signed char
or unsigned char
to get the corresponding arithmetic (signed or unsigned arithmetic have different properties for integers - char
is an integer type).
This doesn't means that you can't make arithmetic with raw chars, as stated:
6.2.5 Types - 3. An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative.
Then if you only use character set elements arithmetic on them is correctly defined.
来源:https://stackoverflow.com/questions/48091302/when-to-use-the-plain-char-type-in-c