Printf function formatter

问题

Having following simple C++ code:

#include <stdio.h>

int main() {
    char c1 = 130;
    unsigned char c2 = 130;

    printf("1: %+u\n", c1);
    printf("2: %+u\n", c2);
    printf("3: %+d\n", c1);
    printf("4: %+d\n", c2);
    ...
    return 0;
}

the output is like that:

1: 4294967170
2: 130
3: -126
4: +130

Can someone please explain me the line 1 and 3 results?

I'm using Linux gcc compiler with all default settings.

回答1:

(This answer assumes that, on your machine, char ranges from -128 to 127, that unsigned char ranges from 0 to 255, and that unsigned int ranges from 0 to 4294967295, which happens to be the case.)

char c1 = 130;

Here, 130 is outside the range of numbers representable by char. The value of c1 is implementation-defined. In your case, the number happens to "wrap around," initializing c1 to static_cast<char>(-126).

printf("1: %+u\n", c1);

c1 is promoted to int, resulting in -126. Then, it is interpreted by the %u specifier as unsigned int. This is undefined behavior. This time the resulting number happens to be the unique number representable by unsigned int that is congruent to -126 modulo 4294967296, which is 4294967170.

printf("3: %+d\n", c1);

The int value -126 is interpreted by the %d specifier as int directly, and outputs -126 as expected (?).

回答2:

In cases 1, 2 the format specifier doesn't match the type of the argument, so the behaviour of the program is undefined (on most systems). On most systems char and unsigned char are smaller than int, so they promote to int when passed as variadic arguments. int doesn't match the format specifier %u which requires unsigned int.

On exotic systems (which your target is not) where unsigned char is as large as int, it will be promoted to unsigned int instead, in which case 4 would have UB since it requires an int.

Explanation for 3 depends a lot on implementation specified details. The result depends on whether char is signed or not, and it depends on the representable range.

If 130 was a representable value of char, such as when it is an unsigned type, then 130 would be the correct output. That appears to not be the case, so we can assume that char is a signed type on the target system.

Initialising a signed integer with an unrepresentable value (such as char with 130 in this case) results in an implementation defined value.

On systems with 2's complement representation for signed numbers - which is ubiquitous representation these days - the implementation defined value is typically the representable value that is congruent with the unrepresentable value modulo the number of representable values. -126 is congruent with 130 modulo 256 and is a representable value of char.

回答3:

A char is 8 bits. This means it can represent 2^8=256 unique values. A uchar represents 0 to 255, and a signed char represents -128 to 127 (could represent absolutely anything, but this is the typical platform implementation). Thus, assigning 130 to a char is out of range by 2, and the value overflows and wraps the value to -126 when it is interpreted as a signed char. The compiler sees 130 as an integer and makes an implicit conversion from int to char. On most platforms an int is 32-bit and the sign bit is the MSB, the value 130 easily fits into the first 8-bits, but then the compiler wants to chop of 24 bits to squeeze it into a char. When this happens, and you've told the compiler you want a signed char, the MSB of the first 8 bits actually represents -128. Uh oh! You have this in memory now 1000 0010, which when interpreted as a signed char is -128+2. My linter on my platform screams about this . .

I make that important point about interpretation because in memory, both values are identical. You can confirm this by casting the value in the printf statements, i.e., printf("3: %+d\n", (unsigned char)c1);, and you'll see 130 again.

The reason you see the large value in your first printf statement is that you are casting a signed char to an unsigned int, where the char has already overflowed. The machine interprets the char as -126 first, and then casts to unsigned int, which cannot represent that negative value, so you get the max value of the signed int and subtract 126.

2^32-126 = 4294967170 . . bingo

In printf statement 2, all the machine has to do is add 24 zeros to reach 32-bit, and then interpret the value as int. In statement one, you've told it that you have a signed value, so it first turns that to a 32-bit -126 value, and then interprets that -ve integer as an unsigned integer. Again, it flips how it interprets the most significant bit. There are 2 steps:

Signed char is promoted to signed int, because you want to work with ints. The char (is probably copied and) has 24 bits added. Because we're looking at a signed value, some machine instruction will happen to perform twos complement, so the memory here looks quite different.
The new signed int memory is interpreted as unsigned, so the machine looks at the MSB and interprets it as 2^32 instead of -2^31 as happened in the promotion.

An interesting bit of trivia, is you can suppress the clang-tidy linter warning if you do char c1 = 130u;, but you still get the same garbage based on the above logic (i.e. the implicit conversion throws away the first 24-bits, and the sign-bit was zero anyhow). I'm have submitted an LLVM clang-tidy missing functionality report based on exploring this question (issue 42137 if you really wanna follow it) 😉.

来源：https://stackoverflow.com/questions/56424618/printf-function-formatter

标签

c++

integer

printf

overflow

implicit-conversion