Precise sum of floating point numbers

后端未结

关注

 4  1471

花落未央 2020-12-06 05:16

I am aware of a similar question, but I want to ask for people opinion on my algorithm to sum floating point numbers as accurately as possible with practical costs.

4条回答

渐次进展 (楼主)

2020-12-06 05:32
My guess is that your binary decomposition will work almost as well as Kahan summation.

Here is an example to illustrate it:
```
#include 
#include 
#include 

void sumpair( float *a, float *b)
{
    volatile float sum = *a + *b;
    volatile float small = sum - std::max(*a,*b);
    volatile float residue = std::min(*a,*b) - small;
    *a = sum;
    *b = residue;
}

void sumpairs( float *a,size_t size, size_t stride)
{
    if (size <= stride*2 ) {
        if( stride
```
I declared my operands volatile and compiled with -ffloat-store to avoid extra precision on x86 architecture g++ -ffloat-store -Wl,-stack_size,0x20000000 test_sum.c and get: (0.03125 is 1ULP) naive sum=-373226.25 dble prec sum=-373223.03 1st approx sum=-373223 2nd approx sum=-373223.06 3rd approx sum=-373223.06 This deserve a little explanation. I first display naive summation Then double precision summation (Kahan is roughly equivalent to that) The 1st approximation is the same as your binary decomposition. Except that I store the sum in data[0] and that I care of storing residues. This way, the exact sum of data before and after summation is unchanged This enables me to approximate the error by summing the residues at 2nd iteration in order to correct the 1st iteration (equivalent to applying Kahan on binary summation) By iterating further I can further refine the result and we see a convergence
0 讨论(0) 查看其它4个回答发布评论: 提交评论加载中...