Does C++20 well-define left shift for signed integers that “overflow”?

问题

In the current C++ Standard Draft, the left shift operator is defined as follows [expr.shift]:

The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N, where N is the width of the type of the result.

Consider int E1 = 2^31-1 = 2'147'483'647, E2 = 1, and int having 32 bits. Then there is an infinite number of numbers congruent to E1×2^E2 = 4'294'967'294 modulo 2^N = 2^32, namely, all the numbers 4'294'967'294 + k×2^32 where k is an arbitrary integer. Examples are 4'294'967'294 (k=0) or -2 (k=-1).

I don't understand what the Standard means by the unique value out of these numbers. Does it mean the unique value that can be represented by the resulting data type? Then, I suppose the result is defined as -2. Is this interpretation correct?

Until C++20, the definition was different and this case would cause undefined behavior. I suppose the change is related to the mandatory 2's-complement representation of negative signed integers.

In fact, there is now no more requirement for E1 to be non-negative. It therefore seems that -1 << 1 is defined as -2. Is that right as well?

回答1:

Does it mean the unique value that can be represented by the resulting data type

Yes. The set of numbers congruent to E1×2^E2 modulo 2^N is infinite, but there is only one value in any interval of size 2^N, therefore there is only one value representable in an integer type of width N.

If we look in the "p0907R1 Signed Integers are Two’s Complement" proposal we find a similar phrase with "unique representation" which makes this more clear:

Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2^N.

Then, I suppose the result is defined as -2. Is this interpretation correct?

Yes

On x64 the equivalent asm instruction is shlx (logical shift left)

I suppose the change is related to the mandatory 2-complement representation of negative signed integers.

Correct. As was the case with unsigned types, now also signed types they mathematically represent equivalence classes (well, it's not clear to me how much this is true as it looks like they want to still keep some UB cases on overflow).

回答2:

So we know that:

E1 = 2147483647
E2 = 1
N = sizeof(int) * CHAR_BIT = 4 * 8 = 32

Let's compute E1×2^E2 modulo 2^N (modulo is the remainder of the division):

x = E1×2^E2 mod 2^N = 2147483647 * 2 ^ 1 mod 4294967296 = 4294967294 mod 4294967296 = 4294967294

Then we go to here:

For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2 N has the same value of corresponding bits in its value representation.

and I think we also need:

The base-2 representation of a value of signed integer type is the base-2 representation of the congruent value of the corresponding unsigned integer type.

That means, that x = 4294967294 is equal to x = -2 for signed int. So the result will be -2.

It therefore seems that -1 << 1 is defined as -2. Is it right as well?

 (signed)-1 << 1 = 
 4294967295 << 1 = 
 4294967295 * 2 ^ 1 mod 4294967296 = 
 8589934590 mod 4294967296 = 
 4294967294 = 
 (signed)-2

来源：https://stackoverflow.com/questions/55468807/does-c20-well-define-left-shift-for-signed-integers-that-overflow

标签

c++

language-lawyer

bit-shift

c++20