Calculate the unit in the last place (ULP) for doubles

后端 未结 2 1236
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-02 00:16

Does .NET have a built-in method to calculate the ULP of a given double or float?

If not, what is the most efficient way to do so?

相关标签:
2条回答
  • 2020-12-02 00:40

    It seems the function is pretty trivial; this is based on the pseudocode in the accepted answer to the question linked by vulkanino:

    double value = whatever;
    long bits = BitConverter.DoubleToInt64Bits(value);
    double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
    double result = nextValue - value;
    

    For floats, you'd need to provide your own implementation of SingleToInt32Bits and Int32BitsToSingle, since BitConverter doesn't have those functions.

    This page shows the special cases in the java implementation of the function; handling those should be fairly trivial, too.

    0 讨论(0)
  • 2020-12-02 00:59

    phoog answer is good but has weaknesses with negative numbers, max_double, infinity and NaN.

    phoog_ULP(positive x) --> a positive number. Good.
    phoog_ULP(negative x) --> a negative number. I would expect positive number.
    To fix this I recommend instead:

    long bits = BitConverter.DoubleToInt64Bits(value) & 0x7FFFFFFFFFFFFFFFL;
    

    Below are fringe cases that need resolution should you care...

    phoog_ULP(x = +/- Max_double 1.797...e+308) returns an infinite result. (+1.996...e+292) expected.
    phoog_ULP(x = +/- Infinity) results in a NaN. +Infinity expected.
    phoog_ULP(x = +/- NaN) may unexpectedly change from a sNan to a qNaN. No change expected. One could argue either way on if the sign should become + in this case.


    To solve these, I only see a short series of brutish if() tests to accommodate these, possible on the "bits" value for expediency. Example:

    double ulpc(double value) {
      long long bits = BitConverter::DoubleToInt64Bits(value);
      if ((bits & 0x7FF0000000000000L) == 0x7FF0000000000000L) { // if x is not finite
        if (bits & 0x000FFFFFFFFFFFFFL) { // if x is a NaN
          return value;  // I did not force the sign bit here with NaNs.
          } 
        return BitConverter.Int64BitsToDouble(0x7FF0000000000000L); // Positive Infinity;
        }
      bits &= 0x7FFFFFFFFFFFFFFFL; // make positive
      if (bits == 0x7FEFFFFFFFFFFFFFL) { // if x == max_double (notice the _E_)
        return BitConverter.Int64BitsToDouble(bits) - BitConverter.Int64BitsToDouble(bits-1);
      }
      double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
      double result = nextValue - fabs(value);
      return result;
    }
    
    0 讨论(0)
提交回复
热议问题