why left+(right-left)/2 will not overflow?

In this article: http://googleresearch.blogspot.sg/2006/06/extra-extra-read-all-about-it-nearly.html, it mentioned most quick sort algorithm had a bug (left+right)/2, and it pointed out that the solution was using left+(right-left)/2 instead of (left+right)/2. The solution was also given in question Bug in quicksort example (K&R C book)?

My question is why left+(right-left)/2 can avoid overflow? How to prove it? Thanks in advance.

You have left < right by definition.

As a consequence, right - left > 0, and furthermore left + (right - left) = right (follows from basic algebra).

And consequently left + (right - left) / 2 <= right. So no overflow can happen since every step of the operation is bounded by the value of right.

By contrast, consider the buggy expression, (left + right) / 2. left + right >= right, and since we don’t know the values of left and right, it’s entirely possible that that value overflows.

Suppose (to make the example easier) the maximum integer is 100, left = 50, and right = 80. If you use the naive formula:

int mid = (left + right)/2;

the addition will result in 130, which overflows.

If you instead do:

int mid = left + (right - left)/2;

you can't overflow in (right - left) because you're subtracting a smaller number from a larger number. That always results in an even smaller number, so it can't possibly go over the maximum. E.g. 80 - 50 = 30.

And since the result is the average of left and right, it must be between them. Since these are both less than the maximum integer, anything between them is also less than the maximum, so there's no overflow.

Basic logic.

by definition left <= MAX_INT
by definition right <= MAX_INT
left+(right-left) is equal to right, which already is <= MAX_INT per #2
and so left+(right-left)/2 must also be <= MAX_INT since x/2 is always smaller than x.

Compare to the original

by definition left <= MAX_INT
by definition right <= MAX_INT
therefore left+right <= MAX_INT
and so (left+right)/2 <= MAX_INT

where statement 3 is clearly false, since left can be MAX_INT (statement 1) and so can right (statement 2).

(This is more an intuitive explanation than a proof.)

Assume your data is unsigned char, and left = 100 and right = 255 (so right as at the edge of the range). If you do left + right, you'll get 355, which does not fit the unsigned char range, so it will overflow.

However, (right-left)/2 is a quantity X such that left + X < right < MAX, where MAX is 255 for unsigned char. This way, you can be sure that the sum can never overflow.

A simple worked example will show it. For simplicity, assume numbers overflow above 999. If we have:

left = 997
right = 999

then:

left + right = 1995

which has overflown before we get to the /2. However:

right - left = 2
(right-left)/2 = 1
left + (right-left)/2 = 997 + 1 = 998

So we've avoided the overflow.

More generally (as others have said): If both left and right are within range (and assuming right > left, then (right-left)/2 will be within range and so too must left + (right-left)/2 since this must be less than right (since you've increased left by half the gap between it and right.

As int data type is 32 bit in Java (Assuming a programming language), any value that surpasses 32 bits gets rolled over. In numerical terms, it means that after incrementing 1 on Integer.MAX_VALUE (2147483647), the returned value will be -2147483648.

Coming to the question above lets assume the following:

int left = 1;
int right = Integer.MAX_VALUE;
int mid;

Case 1:

mid = (left +right)/2; 
//Here the value of left + right would be -2147483648 which would overflow.

Case 2:

mid = left + (left - right)/2;
//This would not have the same problem as above as the value would never exceed "right".

In theory:

Both the values are same as left + (right - left)/2 = (2*left + right - left)/2 = (left + right)/2

Hope this answers your question.

来源：https://stackoverflow.com/questions/27167943/why-leftright-left-2-will-not-overflow

标签

integer-overflow