Different results from similar floating-point functions

so i have 2 functions that should do the same thing

float ver1(float a0, float a1) {
    float r0 = a0 - a1;
    if (abs(r0) > PI) {
        if (r0 > 0) {
            r0 -= PI2;
        } else {
            r0 += PI2;
            }
    }
    return r0;
}

float ver2(float a0, float a1) {
    float a2 = a1 - PI2;

    float r0 = a0 - a1;
    float r1 = a0 - a2;

    if (abs(r0) < abs(r1)) {
        return r0;
    }
    if (abs(r0) > abs(r1)) {
        return r1;
    }

    return 0;
}

note: PI and PI2 are float constants of pi and 2*pi

The odd thing is that sometimes they produce different results, for example if you feed them 0.28605145 and 5.9433694 then the first one results in 0.62586737 and the second one in 0.62586755 and i cant figure out whats causing this.

If you manually calculate what the result should be you'll find that the second answer is correct. This function i use in a 2d physical sim and the really odd thing is that the first answer (the wrong one) works there while the second one (the right one) makes it act all kinds of crazy. Such a tiny difference from an unknown source and such a profound effect :|

At this point im switchign to matrices anyway but this odd situation got me curious, anybody know whats going on?

float typically has a precision of about 24 bits, or about 7 decimal places.

You are subtracting two numbers of similar magnitude (r0+PI2 in the first, a1-PI2 in the second), and so are experiencing loss of significance - several of the most significant bits of the result are zero, so there are fewer bits left to represent the difference. That is why the answers match to only about 6 decimal places.

If you need more precision, then a double or a 32-bit or larger fixed-point representation might be more suitable than a float. There are also arbitrary-precision libraries available, such as GMP, which can represent numbers with all the precision you need, although arithmetic will be significantly slower than with built-in types.

You should use fabs() function instead of abs() because abs() only works with integer numbers. You'll get weird and wrong results when using abs() with floating points.

Floating point numbers don't behave like mathematical real numbers. Every sum of 2 may result in a "error". So I wouldn't call the first correct and the second incorrect just because of one example. You need to be careful of every action you do with floats if you want to keep the error small.

The error is generally smaller if the abs of the numbers are in the same range. And if the ranges are different the error tend to be bigger.

For example 10000000.0 + 0.1 - 10000000.0 is hardly ever 0.1.

If you know the ranges of the input you can adjust the code to reduce errors.

来源：https://stackoverflow.com/questions/12421536/different-results-from-similar-floating-point-functions

标签

c++

floating-accuracy