Extended (80-bit) double floating point in x87, not SSE2 - we don't miss it?

后端 未结 4 1949
长发绾君心
长发绾君心 2020-11-27 18:09

I was reading today about researchers discovering that NVidia\'s Phys-X libraries use x87 FP vs. SSE2. Obviously this will be suboptimal for parallel datasets where speed tr

4条回答
  •  遥遥无期
    2020-11-27 18:22

    To make proper use of extended-precision math, it's necessary that a language support a type which can be used to store the result of intermediate computations, and can be substituted for the expressions yielding those results. Thus, given:

    void print_dist_squared(double x1, double y1, double x2, double y2)
    {
      printf("%12.6f", (x2-x1)*(x2-x1)+(y2-y1)*(y2-y1));
    }
    

    there should be some type that could be used to capture and replace the common sub-expressions x2-x1 and y2-y1, allowing the code to be rewritten as:

    void print_dist_squared(double x1, double y1, double x2, double y2)
    {
      some_type dx = x2-x1;
      some_type dy = y2-y1;
      printf("%12.6f", dx*dx + dy*dy);
    }
    

    without altering the semantics of the program. Unfortunately, ANSI C failed to specify any type which could be used for some_type on platforms which perform extended-precision calculations, and it became far more common to blame Intel for the existence of extended-precision types than to blame ANSI's botched support.

    In fact, extended-precision types have just as much value on platforms without floating-point units as they do on x87 processors, since on such processors a computation like x+y+z would entail the following steps:

    1. Unpack the mantissa, exponent, and possibly sign of x into separate registers (exponent and sign can often "double-bunk")
    2. Unpack y likewise.
    3. Right-shift the mantissa of the value with the lower exponent, if any, and then add or subtract the values.
    4. In case x and y had different signs, left-shift the mantissa until the leftmost bit is 1 and adjust the exponent appropriately.
    5. Pack the exponent and mantissa back into double format.
    6. Unpack the that temporary result.
    7. Unpack z.
    8. Right-shift the mantissa of the value with the lower exponent, if any, and then add or subtract the values.
    9. In case the earlier result and z had different signs, left-shift the mantissa until the leftmost bit is 1 and adjust the exponent appropriately.
    10. Pack the exponent and mantissa back into double format.

    Using an extended-precision type will allow steps 4, 5, and 6 to be eliminated. Since a 53-bit mantissa is too large to fit in less than four 16-bit registers or two 32-bit registers, performing an addition with a 64-bit mantissa isn't any slower than using a 53-bit mantissa, so using extended-precision math offers faster computation with no downside in a language which supports a proper type to hold temporary results. There is no reason to fault Intel for providing an FPU which could perform floating-point math in the fashion that was also the most efficient method on non-FPU chips.

提交回复
热议问题