Extended (80-bit) double floating point in x87, not SSE2 - we don't miss it?

后端未结

关注

 4  1963

长发绾君心 2020-11-27 18:09

I was reading today about researchers discovering that NVidia\'s Phys-X libraries use x87 FP vs. SSE2. Obviously this will be suboptimal for parallel datasets where speed tr

4条回答

遥遥无期 (楼主)

2020-11-27 18:22
To make proper use of extended-precision math, it's necessary that a language support a type which can be used to store the result of intermediate computations, and can be substituted for the expressions yielding those results. Thus, given:
```
void print_dist_squared(double x1, double y1, double x2, double y2)
{
  printf("%12.6f", (x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
}
```
there should be some type that could be used to capture and replace the common sub-expressions x2-x1 and y2-y1, allowing the code to be rewritten as:
```
void print_dist_squared(double x1, double y1, double x2, double y2)
{
  some_type dx = x2-x1;
  some_type dy = y2-y1;
  printf("%12.6f", dx*dx + dy*dy);
}
```
without altering the semantics of the program. Unfortunately, ANSI C failed to specify any type which could be used for some_type on platforms which perform extended-precision calculations, and it became far more common to blame Intel for the existence of extended-precision types than to blame ANSI's botched support.

In fact, extended-precision types have just as much value on platforms without floating-point units as they do on x87 processors, since on such processors a computation like x+y+z would entail the following steps:
1. Unpack the mantissa, exponent, and possibly sign of x into separate registers (exponent and sign can often "double-bunk")
2. Unpack y likewise.
3. Right-shift the mantissa of the value with the lower exponent, if any, and then add or subtract the values.
4. In case x and y had different signs, left-shift the mantissa until the leftmost bit is 1 and adjust the exponent appropriately.
5. Pack the exponent and mantissa back into double format.
6. Unpack the that temporary result.
7. Unpack z.
8. Right-shift the mantissa of the value with the lower exponent, if any, and then add or subtract the values.
9. In case the earlier result and z had different signs, left-shift the mantissa until the leftmost bit is 1 and adjust the exponent appropriately.
10. Pack the exponent and mantissa back into double format.
Using an extended-precision type will allow steps 4, 5, and 6 to be eliminated. Since a 53-bit mantissa is too large to fit in less than four 16-bit registers or two 32-bit registers, performing an addition with a 64-bit mantissa isn't any slower than using a 53-bit mantissa, so using extended-precision math offers faster computation with no downside in a language which supports a proper type to hold temporary results. There is no reason to fault Intel for providing an FPU which could perform floating-point math in the fashion that was also the most efficient method on non-FPU chips.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...