When reading Lua\'s source code, I noticed that Lua uses a macro
to round a double
to a 32-bit int
. I extracted the macro
This kind of "trick" comes from older x86 processors, using the 8087 intructions/interface for floating point. On these machines, there's an instruction for converting floating point to integer "fist", but it uses the current fp rounding mode. Unfortunately, the C spec requires that fp->int conversions truncate towards zero, while all other fp operations round to nearest, so doing an
fp->int conversion requires first changing the fp rounding mode, then doing a fist, then restoring the fp rounding mode.
Now on the original 8086/8087, this wasn't too bad, but on later processors that started to get super-scalar and out-of-order execution, altering the fp rounding mode generally seriales the CPU core and is quite expensive. So on a CPU like a Pentium-III or Pentium-IV, this overall cost is quite high -- a normal fp->int conversion is 10x or more expensive than this add+store+load trick.
On x86-64, however, floating point is done with the xmm instructions, and the cost of converting
fp->int is pretty small, so this "optimization" is likely slower than a normal conversion.
Here is a simpler implementation of the above Lua trick:
/**
* Round to the nearest integer.
* for tie-breaks: round half to even (bankers' rounding)
* Only works for inputs in the range: [-2^51, 2^51]
*/
inline double rint(double d)
{
double x = 6755399441055744.0; // 2^51 + 2^52
return d + x - x;
}
The trick works for numbers with absolute value < 2 ^ 51.
This is a little program to test it: ideone.com
#include <cstdio>
int main()
{
// round to nearest integer
printf("%.1f, %.1f\n", rint(-12345678.3), rint(-12345678.9));
// test tie-breaking rule
printf("%.1f, %.1f, %.1f, %.1f\n", rint(-24.5), rint(-23.5), rint(23.5), rint(24.5));
return 0;
}
// output:
// -12345678.0, -12345679.0
// -24.0, -24.0, 24.0, 24.0
A double
is represented like this:
and it can be seen as two 32-bit integers; now, the int
taken in all the versions of your code (supposing it's a 32-bit int
) is the one on the right in the figure, so what you are doing in the end is just taking the lowest 32 bits of mantissa.
Now, to the magic number; as you correctly stated, 6755399441055744 is 2^51 + 2^52; adding such a number forces the double
to go into the "sweet range" between 2^52 and 2^53, which, as explained by Wikipedia here, has an interesting property:
Between 252=4,503,599,627,370,496 and 253=9,007,199,254,740,992 the representable numbers are exactly the integers
This follows from the fact that the mantissa is 52 bits wide.
The other interesting fact about adding 251+252 is that it affects the mantissa only in the two highest bits - which are discarded anyway, since we are taking only its lowest 32 bits.
Last but not least: the sign.
IEEE 754 floating point uses a magnitude and sign representation, while integers on "normal" machines use 2's complement arithmetic; how is this handled here?
We talked only about positive integers; now suppose we are dealing with a negative number in the range representable by a 32-bit int
, so less (in absolute value) than (-2^31+1); call it -a
. Such a number is obviously made positive by adding the magic number, and the resulting value is 252+251+(-a).
Now, what do we get if we interpret the mantissa in 2's complement representation? It must be the result of 2's complement sum of (252+251) and (-a). Again, the first term affects only the upper two bits, what remains in the bits 0~50 is the 2's complement representation of (-a) (again, minus the upper two bits).
Since reduction of a 2's complement number to a smaller width is done just by cutting away the extra bits on the left, taking the lower 32 bits gives us correctly (-a) in 32 bit, 2's complement arithmetic.