I was reading today about researchers discovering that NVidia\'s Phys-X libraries use x87 FP vs. SSE2. Obviously this will be suboptimal for parallel datasets where speed tr
To make proper use of extended-precision math, it's necessary that a language support a type which can be used to store the result of intermediate computations, and can be substituted for the expressions yielding those results. Thus, given:
void print_dist_squared(double x1, double y1, double x2, double y2)
{
printf("%12.6f", (x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
there should be some type that could be used to capture and replace the common sub-expressions x2-x1
and y2-y1
, allowing the code to be rewritten as:
void print_dist_squared(double x1, double y1, double x2, double y2)
{
some_type dx = x2-x1;
some_type dy = y2-y1;
printf("%12.6f", dx*dx + dy*dy);
}
without altering the semantics of the program. Unfortunately, ANSI C failed to specify any type which could be used for some_type
on platforms which perform extended-precision calculations, and it became far more common to blame Intel for the existence of extended-precision types than to blame ANSI's botched support.
In fact, extended-precision types have just as much value on platforms without floating-point units as they do on x87 processors, since on such processors a computation like x+y+z would entail the following steps:
Using an extended-precision type will allow steps 4, 5, and 6 to be eliminated. Since a 53-bit mantissa is too large to fit in less than four 16-bit registers or two 32-bit registers, performing an addition with a 64-bit mantissa isn't any slower than using a 53-bit mantissa, so using extended-precision math offers faster computation with no downside in a language which supports a proper type to hold temporary results. There is no reason to fault Intel for providing an FPU which could perform floating-point math in the fashion that was also the most efficient method on non-FPU chips.