Why C compiler cannot do signed/unsigned comparisons in an intuitive way [closed]

问题

By "intuitive" I mean given

int a = -1;
unsigned int b = 3;

expression (a < b) should evaluate to 1.

There is a number of questions on Stackoverflow already asking why in this or that particular case C compiler complains about signed/unsigned comparison. The answers boil down to integer conversion rules and such. Yet there does not seem to be a rationale behind why compiler has to be so exceptionally dumb when comparing singed and unsigned integers. Using declarations above, why expression like

(a < b)

is not automatically substituted by

(a < 0 || (unsigned int)a < b)

if there is no single machine instruction to do it properly?

Now, there have been some comments for previous questions in the vein of "if you have to mix signed and unsigned integers, there is something wrong with your program". I would not buy that since libc itself makes it impossible to live in a signed-only or unsigned-only world (e.g. example sprintf() family of functions returns int as the number of bytes written, send() returns ssize_t and so on).

I also don't think I can buy an idea expressed in comments below that implicit conversion of signed integer to unsigned in comparison (the (d - '0' < 10U) "idiom") bestows some additional powers on C programmer compared to explicit cast (((unsigned int)(d - '0') < 10U)). But sure enough it opens wide opportunities to screw up.

And yes, I'm happy that compiler warns me that it cannot do it (unfortunately only if I ask it explicitly). The question is - why can't it? Usually there are good reasons behind standard's rules, so I'm wondering if there are any here?

回答1:

The automatic replacement cannot be made because that's different from C semantics, and would horribly break programs that use the conversion correctly. For example:

if (d-'0'<10U)  // false if d is not a digit

would become true for ASCII space and many other characters with your proposed replacement.

By the way, I believe this question is partly a duplicate of:

Would it break the language or existing code if we'd add safe signed/unsigned compares to C/C++?

回答2:

In this case I'm sure it once again falls back to C (and C++) not making you pay for features you don't need. If the default behavior is satisfactory you simply write the obvious code. If it's not sufficient for your needs, then you write the two part expression yourself, only then paying extra price. If the compiler always did what you suggested you might end up paying a code performance penalty even though the actual range of values used in your program could never cause any problems.

Some compilers then provide you a convenience/correctless warning to let you know you've entered the area where different signed-ness values are being compared.

回答3:

The rules for the usual arithmetic conversions apply to the operands of almost all binary operators. They are a unified framework of dealing with a mix of integral types of different size and signedness in operations that (at least at the machine level) require equal types. The rules were designed to make implementation as simple and efficienmt as possible on common computer architectures. Especially conversion between signed and unsigned int is a generally a no-op on two's complement architectures and comparison remains a single instruction - either signed or unsigned.

An exception like the one you suggest would have been possible for the very special case of comparisons between signed and unsigned types. The cost would have been an irregularity in the rules for dealing with expression operands and a complicated implementation - a signed

The designers of C chose not to do so. Changing that decision would break lots of existing code for limited benefit - you'll still encounter common arithmetic conversions with other operators, so you must be aware of them.

Compilers warn (or can be made to warn) about conversions that may have surprising results, so that you are not surprised by an unintended mix of integers of differing signedness or size. Use casts to express exactly how you want this to be evaluated - that gets rid of the warnings and helps the next reader of your code.

回答4:

If I'm not mistaken, it's only a warning, and can thereby be disregarded.

The problem is the range of the integer variants.

While a signed integer can hold values from -2147483648 to 2147483648 (+- one or two), an unsigned integer can range from 0 to 4294967296.

That means, if you compare a signed integer to an unsigned integer, it may lead to false results altogether, because internally the sign is represented by the MSB of the integer.

An example:

You have the number -1 and the number 3,000,000,000. Which one is larger? Clearly the second one you may say...but for the computer, the -1 is actually larger, because 'as unsigned' (which would be required to evaluate the large one correctly), -1 is represented as the maximum number. (4294967296).

On the contrary, if both are treated as signed, the large number will be some rather high negative number, because it's beyond the scope of a signed integer.

That's why the compiler outputs this warning. While the actual error case is rather rare, it still MAY happen. And that's just what the compiler warns you of...that something unexpected may happen when comparing two differently signed integers.

来源：https://stackoverflow.com/questions/14485655/why-c-compiler-cannot-do-signed-unsigned-comparisons-in-an-intuitive-way

标签

c++

comparison

unsigned

signed