c++ illogical >= comparison when dealing with vector.size() most likely due to size_type being unsigned

I could use a little help clarifying this strange comparison when dealing with vector.size() aka size_type

vector<cv::Mat> rebuiltFaces;
int rebuildIndex = 1;
cout << "rebuiltFaces size is " << rebuiltFaces.size() << endl;

while( rebuildIndex >= rebuiltFaces.size() ) {
    cout << (rebuildIndex >= rebuiltFaces.size()) << " , " << rebuildIndex << " >= " << rebuiltFaces.size() << endl;
    --rebuildIndex;
}

And what I get out of the console is

rebuiltFaces size is 0
1 , 1 >= 0
1 , 0 >= 0
1 , -1 >= 0
1 , -2 >= 0
1 , -3 >= 0

If I had to guess I would say the compiler is blindly casting rebuildIndex to unsigned and the +- but is causing things to behave oddly, but I'm really not sure. Does anyone know?

As others have pointed out, this is due to the somewhat counter-intuitive rules C++ applies when comparing values with different signedness; the standard requires the compiler to convert both values to unsigned. For this reason, it's generally considered best practice to avoid unsigned unless you're doing bit manipulations (where the actual numeric value is irrelevant). Regretfully, the standard containers don't follow this best practice.

If you somehow know that the size of the vector can never overflow int, then you can just cast the results of std::vector<>::size() to int and be done with it. This is not without danger, however; as Mark Twain said: "It's not what you don't know that kills you, it's what you know for sure that ain't true." If there are no validations when inserting into the vector, then a safer test would be:

while ( rebuildFaces.size() <= INT_MAX
        && rebuildIndex >= (int)rebuildFaces.size() )

Or if you really don't expect the case, and are prepared to abort if it occurs, design (or find) a checked_cast function, and use it.

On any modern computer that I can think of, signed integers are represented as two's complement. 32-bit int max is 0x7fffffff, and int min is 0x80000000, this makes adding easy when the value is negative. The system works so that 0xffffffff is -1, and adding one to that causes the bits to all roll over and equal zero. It's a very efficient thing to implement in hardware.

When the number is cast from a signed value to an unsigned value the bits stored in the register don't change. This makes a barely negative value like -1 into a huge unsigned number (unsigned max), and this would make that loop run for a long time if the code inside didn't do something that would crash the program by accessing memory it shouldn't.

Its all perfectly logical, just not necessarily the logic you expected.

Example...

$ cat foo.c
#include <stdio.h>

int main (int a, char** v) {
  unsigned int foo = 1;
  int bar = -1;

  if(foo < bar) printf("wat\n");
  return 0;
}

$ gcc -o foo foo.c
$ ./foo
wat
$

In C and C++ languages when unsigned type has the same or greater width than signed type, mixed signed/unsigned comparisons are performed in the domain of unsigned type. The singed value is implicitly converted to unsigned type. There's nothing about the "compiler" doing anything "blindly" here. It was like that in C and C++ since the beginning of times.

This is what happens in your example. Your rebuildIndex is implicitly converted to vector<cv::Mat>::size_type. I.e. this

rebuildIndex >= rebuiltFaces.size()

is actually interpreted as

(vector<cv::Mat>::size_type) rebuildIndex >= rebuiltFaces.size()

When signed value are converted to unsigned type, the conversion is performed in accordance with the rules of modulo arithmetic, which is a well-known fundamental principle behind unsigned arithmetic in C and C++.

Again, all this is required by the language, it has absolutely nothing to do with how numbers are represented in the machine etc and which bits are stored where.

Regardless of the underlying representation (two's complement being the most popular, but one's complement and sign magnitude are others), if you cast -1 to an unsigned type, you will get the largest number that can be represented in that type.

The reason is that unsigned 'overflow' behavior is strictly defined as converting the value to the number between 0 and the maximum value of that type by way of modulo arithmetic. Essentially, if the value is larger than the largest value, you repeatedly subtract the maximum value until your value is in range. If your value is smaller than the smallest value (0), you repeatedly add the largest value until it's in range. So if we assume a 32-bit size_t, you start with -1, which is less than 0. Therefore, you add 2^32, giving you 2^32 - 1, which is in range, so that's your final value.

Roughly speaking, C++ defines promotion rules like this: any type of char or short is first promoted to int, regardless of signedness. Smaller types in a comparison are promoted up to the larger type in the comparison. If two types are the same size, but one is signed and one is unsigned, then the signed type is converted to unsigned. What is happening here is that your rebuildIndex is being converted up to the unsigned size_t. 1 is converted to 1u, 0 is converted to 0u, and -1 is converted to -1u, which when cast to an unsigned type is the largest value of type size_t.

来源：https://stackoverflow.com/questions/12152605/c-illogical-comparison-when-dealing-with-vector-size-most-likely-due-to-s

标签

c++

size-type