output of negative integer to %u format specifier

问题

Consider the following code

char c=125;
c+=10;
printf("%d",c); //outputs  -121 which is understood.
printf("%u",c); // outputs 4294967175.
printf("%u",-121); // outputs 4294967175

%d accepts negative numbers therefore output is -121 in first case. output in case 2 and case 3 is 4294967175. I don't understand why?

回答1:

2³² - 121 = 4294967175

printf interprets data you provide thanks to the % values

%d signed integer, value from -2³¹ to 2³¹-1
%u unsigned integer, value from 0 to 2³²-1

In binary, both integer values (-121 and 4294967175) are (of course) identical:

`0xFFFFFF87`

See Two's complement

回答2:

printf is a function with variadic arguments. In such case a "default argument promotions" are applied on arguments before the function is called. In your case, c is first converted from char to int and then sent to printf. The conversion does not depend on the corresponding '%' specifier of the format. The value of this int parameter can be interpreted as 4294967175 or -121 depending on signedness. The corresponding parts in the C standard are:

6.5.2.2 Function call

6 - ... If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions.

7- If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type. The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.

回答3:

If char is signed in your compiler (which is the most likely case) and is 8 bits wide (extremely likely,) then c+=10 will overflow it. Overflow of a signed integer results in undefined behavior. This means you can't reason about the results you're getting.

If char is unsigned (not very likely on most PC platforms), then see the other answers.

回答4:

printf uses something called variadic arguments. If you make a brief research about them you'll find out that the function that uses them does not know the type of the input you're passing to it. Therefore there must be a way to tell the function how it must interpret the input, and you're doing it with the format specifiers.

In your particular case, c is a 8-bit signed integer. Therefore, if you set it to the literal -121 inside it, it will memorize: 10000111. Then, by the integer promotion mechanism you have it converted to an int: 11111111111111111111111110000111.

With "%d" you tell printf to interpret 11111111111111111111111110000111 as a signed integer, therefore you have -121 as output. However, with "%u" you're telling printf that 11111111111111111111111110000111 is an unsigned integer, therefore it will output 4294967175.

EDIT: As stated in the comments, actually the behaviour is undefined in C. That's because you have more than one way to encode negative numbers (sign and modulo, One's complement, ...) and sone other aspects (such as endianness, if I'm not wrong, influences this result). So the result is said to be implementation defined. Therefore you may get a different output rather than 4294967175. But the main concepts I explained for different interpretation of the same string of bits and the lossness of the type of data in variadic arguments still hold.

Try to convert the number into base 10, first as a pure binary number, then knowing that it's memorized in 32-bit Two's complement... you get two different results. But if I do not tell you which intepretation you need to use, that binary string can represent everything (a 4-char ASCII string, a number, a small 8-bit 2x2 image, your safe combination, ...).

EDIT: you can think of "%<format_string>" as a sort of "extension" for that string of bits. You know, when you create a file, you usually give it an extension, which is actually a part of the filename, to remember in which format/encoding that file has been stored. Let's suppose you have your favorite song saved as song.ogg file on your PC. If you rename the file in song.txt, song.odt, song.pdf, song, song.akwardextension, that does not change the content of the file. But if you try to open it with the program usually associated to .txt or .whatever, it reads the bytes in the file, but when it tries to interpret sequences of bytes it may fail (that's why if you open song.ogg with Emacs or VIm or whatever text editor you get sonething that looks like garbage information, if you open it with, for instance, GIMP, GIMP cannot read it, and if you open it with VLC you listen to your favorite song). The extension is just a reminder for you: it reminds you how to interpret that sequence of bits. As printf has no knowledge for that interpretation, you need to provide it one, and if you tell printf that a signed integer is acutally unsigned, well, it's like opening song.ogg with Emacs...

来源：https://stackoverflow.com/questions/47739874/output-of-negative-integer-to-u-format-specifier

标签

format-specifiers