Properties of 80-bit extended precision computations starting from double precision arguments

后端 未结 2 1204
情话喂你
情话喂你 2020-12-11 16:14

Here are two implementations of interpolation functions. Argument u1 is always between 0. and 1..

#include 

        
2条回答
  •  被撕碎了的回忆
    2020-12-11 16:54

    The main source of loss-of-precision in interpol_64 is the multiplications. Multiplying two 53-bit mantissas yields a 105- or 106-bit (depending on whether the high bit carries) mantissa. This is too large to fit in an 80-bit extended precision value, so in general, you'll also have loss-of-precision in the 80-bit version. Quantifying exactly when it happens is very difficult; the most that's easy to say is that it happens when rounding errors accumulate. Note that there's also a small rounding step when adding the two terms.

    Most people would probably just solve this problem with a function like:

    double interpol_64(double u1, double u2, double u3)
    { 
      return u2 + u1 * (u3 - u2);
    }
    

    But it looks like you're looking for insight into the rounding issues, not a better implementation.

提交回复
热议问题