What's a good way to add a large number of small floats together?

Say you have 100000000 32-bit floating point values in an array, and each of these floats has a value between 0.0 and 1.0. If you tried to sum them all up like this

result = 0.0;
for (i = 0; i < 100000000; i++) {
    result += array[i];
}

you'd run into problems as result gets much larger than 1.0.

So what are some of the ways to more accurately perform the summation?

Sounds like you want to use Kahan Summation.

According to Wikipedia,

The Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).

In pseudocode, the algorithm is:
function kahanSum(input)
 var sum = input[1]
 var c = 0.0          //A running compensation for lost low-order bits.
 for i = 2 to input.length
  y = input[i] - c    //So far, so good: c is zero.
  t = sum + y         //Alas, sum is big, y small, so low-order digits of y are lost.
  c = (t - sum) - y   //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
  sum = t             //Algebraically, c should always be zero. Beware eagerly optimising compilers!
 next i               //Next time around, the lost low part will be added to y in a fresh attempt.
return sum

Tuomas Pelkonen

Make result a double, assuming C or C++.

If you can tolerate a little extra space (in Java):

float temp = new float[1000000];
float temp2 = new float[1000];
float sum = 0.0f;
for (i=0 ; i<1000000000 ; i++) temp[i/1000] += array[i];
for (i=0 ; i<1000000 ; i++) temp2[i/1000] += temp[i];
for (i=0 ; i<1000 ; i++) sum += temp2[i];

Standard divide-and-conquer algorithm, basically. This only works if the numbers are randomly scattered; it won't work if the first half billion numbers are 1e-12 and the second half billion are much larger.

But before doing any of that, one might just accumulate the result in a double. That'll help a lot.

If in .NET using the LINQ .Sum() extension method that exists on an IEnumerable. Then it would just be:

var result = array.Sum();

The absolutely optimal way is to use a priority queue, in the following way:

PriorityQueue<Float> q = new PriorityQueue<Float>();
for(float x : list) q.add(x);
while(q.size() > 1) q.add(q.pop() + q.pop());
return q.pop();

(this code assumes the numbers are positive; generally the queue should be ordered by absolute value)

Explanation: given a list of numbers, to add them up as precisely as possible you should strive to make the numbers close, t.i. eliminate the difference between small and big ones. That's why you want to add up the two smallest numbers, thus increasing the minimal value of the list, decreasing the difference between the minimum and maximum in the list and reducing the problem size by 1.

Unfortunately I have no idea about how this can be vectorized, considering that you're using OpenCL. But I am almost sure that it can be. You might take a look at the book on vector algorithms, it is surprising how powerful they actually are: Vector Models for Data-Parallel Computing

来源：https://stackoverflow.com/questions/2456338/whats-a-good-way-to-add-a-large-number-of-small-floats-together

标签

algorithm

floating-point

numerical