Companion to hypot()

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-08 13:11:18

问题


The hypot function, introduced into C in the 1999 revision of the language, calculates the hypotenuse of a right triangle given the other sides as arguments, but with care taken to avoid the over/underflow which would result from the naive implementation as

double hypot(double a, double b)
{
  return sqrt(a*a + b*b);
}

I find myself with the need for companion functionality: given a side and the hypotenuse of a triangle, find the third side (avoiding under/overflow). I can think of a few ways to do this, but wondered if there was an existing "best practice"?

My target is Python, but really I'm looking for algorithm pointers.


Thanks for the replies. In case anyone is interested in the result, my C99 implementation can be found here and a Python version here, part of the Hypothesis project.


回答1:


The first thing to do is factorize:

b = sqrt(h*h - a*a) = sqrt((h-a)*(h+a))

We have not only avoided some overflow, but also gained accuracy.

If any factor is close to 1E+154 = sqrt(1E+308) (max with IEEE 754 64 bits float) then we must also avoid overflow:

sqrt((h-a)*(h+a)) = sqrt(h-a) * sqrt(h+a)

This case is very unlikely, so the two sqrt's are justified, even if its slower than just a sqrt.

Notice that if h ~ 5E+7 * a then h ~ b which means that there are not enough digits to represent b as different from h.




回答2:


This answer assumes a platform that uses floating-point arithmetic compliant with IEEE-754 (2008) and provides fused multiply-add (FMA) capability. Both conditions are met by common architectures such as x86-64, ARM64, and Power. FMA is exposed in ISO C99 and later C standards as a standard math function fma(). On hardware that does not provide an FMA instruction, this requires emulation, which could be slow and functionally deficient.

Mathematically, the length of one leg (cathetus) in a right triangle, given the length of the hypotenuse and the other leg, is simply computed as √(h²-a²), where h is the length of the hypotenuse. But when computed with finite-precision floating-point arithmetic, we face two problems: Overflow or underflow to zero may take place when computing the squares, and subtraction of the squares gives rise to subtractive cancellation when the squares have similar magnitude.

The first issue is easily taken care of by scaling by 2n such that the term larger in magnitude is moved closer to unity. As subnormal numbers may be involved, this cannot be accomplished by manipulating the exponent field, as there may be a need to normalize / denormalize. But we can compute the required scale factors by exponent field bit manipulation, the multiply by the factors. We know that the hypotenuse has to be longer or the same length as the given leg for non-exceptional cases, so can base the scaling on that argument.

Dealing with subtractive cancellation is harder, but we are lucky that computation very similar to our computation h²-a² occurs in other important problems. For example, the grandmaster of floating-point computation looked into the accurate computation of the discriminant of the quadratic formula, b²-4ac:

William Kahan, "On the Cost of Floating-Point Computation Without Extra-Precise Arithmetic", Nov. 21, 2004 (online)

More recently, French researchers addressed the more general case of the difference of two products, ad-bc:

Claude-Pierre Jeannerod, Nicolas Louvet, Jean-Michel Muller, "Further analysis of Kahan's algorithm for the accurate computation of 2 x 2 determinants." Mathematics of Computation, Vol. 82, No. 284, Oct. 2013, pp. 2245-2264 (online)

The FMA-based algorithm in the second paper computes the difference of two products with a proven maximum error of 1.5 ulp. With this building block, we arrive at the straightforward ISO C99 implementation of the cathetus computation below. A maximum error of 1.2 ulp was observed in one billion random trials as determined by comparing with the results from an arbitrary-precision library:

#include <stdint.h>
#include <string.h>
#include <float.h>
#include <math.h>

uint64_t __double_as_uint64 (double a)
{
    uint64_t r;
    memcpy (&r, &a, sizeof r);
    return r;
}

double __uint64_as_double (uint64_t a)
{
    double r;
    memcpy (&r, &a, sizeof r);
    return r;
}

/*
  diff_of_products() computes a*b-c*d with a maximum error < 1.5 ulp

  Claude-Pierre Jeannerod, Nicolas Louvet, and Jean-Michel Muller, 
  "Further Analysis of Kahan's Algorithm for the Accurate Computation 
  of 2x2 Determinants". Mathematics of Computation, Vol. 82, No. 284, 
  Oct. 2013, pp. 2245-2264
*/
double diff_of_products (double a, double b, double c, double d)
{
    double w = d * c;
    double e = fma (-d, c, w);
    double f = fma (a, b, -w);
    return f + e;
}

/* compute sqrt (h*h - a*a) accurately, avoiding spurious overflow */
double my_cathetus (double h, double a)
{
    double fh, fa, res, scale_in, scale_out, d, s;
    uint64_t expo;

    fh = fabs (h);
    fa = fabs (a);

    /* compute scale factors */
    expo = __double_as_uint64 (fh) & 0xff80000000000000ULL;
    scale_in = __uint64_as_double (0x7fc0000000000000ULL - expo);
    scale_out = __uint64_as_double (expo + 0x0020000000000000ULL);

    /* scale fh towards unity */
    fh = fh * scale_in;
    fa = fa * scale_in;

    /* compute sqrt of difference of scaled arguments, avoiding overflow */
    d = diff_of_products (fh, fh, fa, fa);
    s = sqrt (d);

    /* reverse previous scaling */
    res = s * scale_out;

    /* handle special arguments */
    if (isnan (h) || isnan (a)) {
        res = h + a;
    }

    return res;
}



回答3:


Assuming IEEE 754 basic 64-bit binary floating-point, I would consider an algorithm such as:

  • Set s (for scale) to be 2−512 if 2100a, 2+512 if a < 2−100, and 1 otherwise.
  • Let a' be as and b' be bs.
  • Compute sqrt(a'•a' − b'•b') / s.

Notes about the reasoning:

  • If a is large (or small), multiplying by s decreases (or increases) the values so that the square of a' remains in the floating-point range.
  • The scale factor is a power of two, so multiplying and dividing by it is exact in binary floating-point.
  • b is necessarily smaller than (or equal to) a, or else we return NaN, which is appropriate. In the case where we are increasing a, no error occurs; b' and b'•b' remain within range. In the case where we are decreasing a, b' may lose precision or become zero if b is small, but then b is so much smaller than a that the computed result cannot depend on the precise value of b in any case.
  • I partitioned the floating-point range into three intervals because two will not suffice. For example, if you set s to be 2−512 if 1 ≤ a and 2+512 otherwise, then 1 will scale to 2−512 and then square to 2−1024, at which point a b slightly under 1 will be losing precision relevant to the result. But if you use a less-magnitude power for s, such as 2−511, then 21023 will scale to 2512 and square to 21024, which is out of bounds. Therefore, we need different scale factors for a = 1 and a = 21023. Similarly, a = 2−1049 needs a scale factor that would be too large for a = 1. So three are needed.
  • Division is notoriously slow, so one might want to multiply by a prepared s−1 rather than dividing by s.



回答4:


hypot has its idiosyncrasies in that it's one of a very select few C standard library functions that does not propagate NaN! (Another one is pow for the case where the first argument being 1.)

Setting that aside, I'd be inclined to write merely

returns sqrt(h * h - a * a); // h is the hypotenuse

as the body of the function, and burden the caller with checking the inputs. If you can't do that then follow the specification of hypot faithfully.



来源:https://stackoverflow.com/questions/49191477/companion-to-hypot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!