Comparing IEEE floats and doubles for equality

前端 未结 15 2083
南方客
南方客 2020-11-30 06:00

What is the best method for comparing IEEE floats and doubles for equality? I have heard of several methods, but I wanted to see what the community thought.

相关标签:
15条回答
  • 2020-11-30 06:51

    If you have floating point errors you have even more problems than this. Although I guess that is up to personal perspective.

    Even if we do the numeric analysis to minimize accumulation of error, we can't eliminate it and we can be left with results that ought to be identical (if we were calculating with reals) but differ (because we cannot calculate with reals).

    0 讨论(0)
  • 2020-11-30 06:53

    In numerical software you often want to test whether two floating point numbers are exactly equal. LAPACK is full of examples for such cases. Sure, the most common case is where you want to test whether a floating point number equals "Zero", "One", "Two", "Half". If anyone is interested I can pick some algorithms and go more into detail.

    Also in BLAS you often want to check whether a floating point number is exactly Zero or One. For example, the routine dgemv can compute operations of the form

    • y = beta*y + alpha*A*x
    • y = beta*y + alpha*A^T*x
    • y = beta*y + alpha*A^H*x

    So if beta equals One you have an "plus assignment" and for beta equals Zero a "simple assignment". So you certainly can cut the computational cost if you give these (common) cases a special treatment.

    Sure, you could design the BLAS routines in such a way that you can avoid exact comparisons (e.g. using some flags). However, the LAPACK is full of examples where it is not possible.

    P.S.:

    • There are certainly many cases where you don't want check for "is exactly equal". For many people this even might be the only case they ever have to deal with. All I want to point out is that there are other cases too.

    • Although LAPACK is written in Fortran the logic is the same if you are using other programming languages for numerical software.

    0 讨论(0)
  • 2020-11-30 06:54

    If you are looking for two floats to be equal, then they should be identically equal in my opinion. If you are facing a floating point rounding problem, perhaps a fixed point representation would suit your problem better.

    Perhaps we cannot afford the loss of range or performance that such an approach would inflict.

    0 讨论(0)
  • 2020-11-30 06:57

    @DrPizza: I am no performance guru but I would expect fixed point operations to be quicker than floating point operations (in most cases).

    @Craig H: Sure. I'm totally okay with it printing that. If a or b store money then they should be represented in fixed point. I'm struggling to think of a real world example where such logic ought to be allied to floats. Things suitable for floats:

    • weights
    • ranks
    • distances
    • real world values (like from a ADC)

    For all these things, either you much then numbers and simply present the results to the user for human interpretation, or you make a comparative statement (even if such a statement is, "this thing is within 0.001 of this other thing"). A comparative statement like mine is only useful in the context of the algorithm: the "within 0.001" part depends on what physical question you're asking. That my 0.02. Or should I say 2/100ths?

    0 讨论(0)
  • 2020-11-30 07:00

    @DrPizza: I am no performance guru but I would expect fixed point operations to be quicker than floating point operations (in most cases).

    It rather depends on what you are doing with them. A fixed-point type with the same range as an IEEE float would be many many times slower (and many times larger).

    Things suitable for floats:

    3D graphics, physics/engineering, simulation, climate simulation....

    0 讨论(0)
  • 2020-11-30 07:03

    The current version I am using is this

    bool is_equals(float A, float B,
                   float maxRelativeError, float maxAbsoluteError)
    {
    
      if (fabs(A - B) < maxAbsoluteError)
        return true;
    
      float relativeError;
      if (fabs(B) > fabs(A))
        relativeError = fabs((A - B) / B);
      else
        relativeError = fabs((A - B) / A);
    
      if (relativeError <= maxRelativeError)
        return true;
    
      return false;
    }
    

    This seems to take care of most problems by combining relative and absolute error tolerance. Is the ULP approach better? If so, why?

    0 讨论(0)
提交回复
热议问题