According to C11 WG14 draft version N1570:
The header
declares several functions useful for classifying and mapping cha
What does representable in a type mean?
Re-formulated, a type is a convention for what the underlying bit-patterns mean. A value is thus representable in a type, if that type assigns some bit-pattern that meaning.
A conversion (which might need a cast), is a mapping from a value (represented with a specific type) to a value (possibly different) represented in the target type.
Under the given assumption (that char
is signed), CHAR_MIN
is certainly negative, and the text you quoted leaves no room for interpretation:
Yes, it is undefined behavior, as unsigned char
cannot represent any negative numbers.
If that assumption did not hold, your program would be well-defined, because CHAR_MIN
would be 0
, a valid value for unsigned char
.
Thus, we have a case where it is implementation-defined whether the program is undefined or well-defined.
As an aside, there is no guarantee that sizeof(int)>1
or INT_MAX >= CHAR_MAX
, so int
might not be able to represent all values possible for unsigned char
.
As conversions are defined to be value-preserving, a signed char
can always be converted to int
.
But if it was negative, that does not change the impossibility of representing a negative value as an unsigned char
. (The conversion is defined, as conversion from any integral type to any unsigned
integral type is always defined, though narrowing conversions need a cast.)
Under the assumption that char is signed then this would be undefined behavior, otherwise it is well defined since CHAR_MIN
would have the value 0
. It is easier to see the intention and meaning of:
the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF
if we read section 7.4
Character handling <ctype.h> from the Rationale for International Standard—Programming Languages—C which says (emphasis mine going forward):
Since these functions are often used primarily as macros, their domain is restricted to the small positive integers representable in an unsigned char, plus the value of EOF. EOF is traditionally -1, but may be any negative integer, and hence distinguishable from any valid character code. These macros may thus be efficiently implemented by using the argument as an index into a small array of attributes.
So valid values are:
EOF
which is some implementation defined negative numberEven though this is C99 rationale since the particular wording you are referring to does not change from C99 to C11 and so the rationale still fits.
We can also find why the interface uses int as an argument as opposed to char, from section 7.1.4
Use of library functions, it says:
All library prototypes are specified in terms of the “widened” types an argument formerly declared as char is now written as int. This ensures that most library functions can be called with or without a prototype in scope, thus maintaining backwards compatibility with pre-C89 code. Note, however, that since functions like printf and scanf use variable-length argument lists, they must be called in the scope of a prototype.
The revealing quote (for me) is §6.3.1.3/1:
if the value can be represented by the new type, it is unchanged.
i.e., if the value has to be changed then the value can't be represented by the new type.
Therefore an unsigned
type can't represent a negative value.
To answer the question in the title: "representable" refers to "can be represented" from §6.3.1.3 and unrelated to "object representation" from §6.2.6.1.
It seems trivial in retrospect. I might have been confused by the habit of treating b'\xFF'
, 0xff
, 255
, -1
as the same byte in Python:
>>> (255).to_bytes(1, 'big')
b'\xff'
>>> int.from_bytes(b'\xFF', 'big')
255
>>> 255 == 0xff
True
>>> (-1).to_bytes(1, 'big', signed=True)
b'\xff'
and the disbelief that it is an undefined behavior to pass a character to a character classification function e.g., isspace(CHAR_MIN)
.