Precise sum of floating point numbers

后端 未结 4 1471
花落未央
花落未央 2020-12-06 05:16

I am aware of a similar question, but I want to ask for people opinion on my algorithm to sum floating point numbers as accurately as possible with practical costs.

4条回答
  •  渐次进展
    2020-12-06 05:32

    My guess is that your binary decomposition will work almost as well as Kahan summation.

    Here is an example to illustrate it:

    #include 
    #include 
    #include 
    
    void sumpair( float *a, float *b)
    {
        volatile float sum = *a + *b;
        volatile float small = sum - std::max(*a,*b);
        volatile float residue = std::min(*a,*b) - small;
        *a = sum;
        *b = residue;
    }
    
    void sumpairs( float *a,size_t size, size_t stride)
    {
        if (size <= stride*2 ) {
            if( stride

    I declared my operands volatile and compiled with -ffloat-store to avoid extra precision on x86 architecture

    g++  -ffloat-store  -Wl,-stack_size,0x20000000 test_sum.c
    

    and get: (0.03125 is 1ULP)

    naive      sum=-373226.25
    dble prec  sum=-373223.03
    1st approx sum=-373223
    2nd approx sum=-373223.06
    3rd approx sum=-373223.06
    

    This deserve a little explanation.

    • I first display naive summation
    • Then double precision summation (Kahan is roughly equivalent to that)
    • The 1st approximation is the same as your binary decomposition. Except that I store the sum in data[0] and that I care of storing residues. This way, the exact sum of data before and after summation is unchanged
    • This enables me to approximate the error by summing the residues at 2nd iteration in order to correct the 1st iteration (equivalent to applying Kahan on binary summation)
    • By iterating further I can further refine the result and we see a convergence

提交回复
热议问题