Calculate the unit in the last place (ULP) for doubles

后端未结

关注

 2  1242

佛祖请我去吃肉

Does .NET have a built-in method to calculate the ULP of a given double or float?

If not, what is the most efficient way to do so?

相关标签:

2条回答

借酒劲吻你

2020-12-02 00:40
It seems the function is pretty trivial; this is based on the pseudocode in the accepted answer to the question linked by vulkanino:
```
double value = whatever;
long bits = BitConverter.DoubleToInt64Bits(value);
double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
double result = nextValue - value;
```
For floats, you'd need to provide your own implementation of SingleToInt32Bits and Int32BitsToSingle, since BitConverter doesn't have those functions.

This page shows the special cases in the java implementation of the function; handling those should be fairly trivial, too.
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧时难觅i

2020-12-02 00:59
phoog answer is good but has weaknesses with negative numbers, max_double, infinity and NaN.

phoog_ULP(positive x) --> a positive number. Good.
phoog_ULP(negative x) --> a negative number. I would expect positive number.
To fix this I recommend instead:
```
long bits = BitConverter.DoubleToInt64Bits(value) & 0x7FFFFFFFFFFFFFFFL;
```
Below are fringe cases that need resolution should you care...

phoog_ULP(x = +/- Max_double 1.797...e+308) returns an infinite result. (+1.996...e+292) expected.
phoog_ULP(x = +/- Infinity) results in a NaN. +Infinity expected.
phoog_ULP(x = +/- NaN) may unexpectedly change from a sNan to a qNaN. No change expected. One could argue either way on if the sign should become + in this case.

To solve these, I only see a short series of brutish if() tests to accommodate these, possible on the "bits" value for expediency. Example:
```
double ulpc(double value) {
  long long bits = BitConverter::DoubleToInt64Bits(value);
  if ((bits & 0x7FF0000000000000L) == 0x7FF0000000000000L) { // if x is not finite
    if (bits & 0x000FFFFFFFFFFFFFL) { // if x is a NaN
      return value;  // I did not force the sign bit here with NaNs.
      } 
    return BitConverter.Int64BitsToDouble(0x7FF0000000000000L); // Positive Infinity;
    }
  bits &= 0x7FFFFFFFFFFFFFFFL; // make positive
  if (bits == 0x7FEFFFFFFFFFFFFFL) { // if x == max_double (notice the _E_)
    return BitConverter.Int64BitsToDouble(bits) - BitConverter.Int64BitsToDouble(bits-1);
  }
  double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
  double result = nextValue - fabs(value);
  return result;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...