How to calculate iteratively the running weighted average so that last values to weight most?

问题

I want to implement an iterative algorithm, which calculates weighted average. The specific weight law does not matter, but it should be close to 1 for the newest values and close to 0 to the oldest.

The algorithm should be iterative. i.e. it should not remember all previous values. It should know only one newest value and any aggregative information about past, like previous values of the average, sums, counts etc.

Is it possible?

For example, the following algorithm can be:

void iterate(double value) {
   sum *= 0.99;
   sum += value;
   count++;
   avg = sum / count;
}

It will give exponential decreasing weight, which may be not good. Is it possible to have step decreasing weight or something?

EDIT 1

The the requirements for weighing law is follows:

1) The weight decreases into past 2) I has some mean or characteristic duration so that values older this duration matters much lesser than newer ones 3) I should be able to set this duration

EDIT 2

I need the following. Suppose v_i are values, where v_1 is the first. Also suppose w_i are weights. But w_0 is THE LAST.

So, after first value came I have first average

 a_1 = v_1 * w_0

After the second value v_2 came, I should have average

 a_2 = v_1 * w_1 + v_2 * w_0

With next value I should have

 a_3 = v_1 * w_2 + v_2 * w_1 + v_3 * w_0

Note, that weight profile is moving with me, while I am moving along value sequence.

I.e. each value does not have it's own weight all the time. My goal is to have this weight lower while going to past.

回答1:

First a bit of background. If we were keeping a normal average, it would go like this:

average(a) = 11
average(a,b) = (average(a)+b)/2
average(a,b,c) = (average(a,b)*2 + c)/3
average(a,b,c,d) = (average(a,b,c)*3 + d)/4

As you can see here, this is an "online" algorithm and we only need to keep track of pieces of data: 1) the total numbers in the average, and 2) the average itself. Then we can undivide the average by the total, add in the new number, and divide it by the new total.

Weighted averages are a bit different. It depends on what kind of weighted average. For example if you defined:

weightedAverage(a,wa, b,wb, c,wc, ..., z,wz) = a*wa + b*wb + c*wc + ... + w*wz
 or
weightedAverage(elements, weights) = elements·weights

...then you don't need to do anything besides add the new element*weight! If however you defined the weighted average akin to an expected-value from probability:

weightedAverage(elements,weights) = elements·weights / sum(weights)

...then you'd need to keep track of the total weights. Instead of undividing by the total number of elements, you undivide by the total weight, add in the new element&ast;weight, then divide by the new total weight.

Alternatively you don't need to undivide, as demonstrated below: you can merely keep track of the temporary dot product and weight total in a closure or an object, and divide it as you yield (this can help a lot with avoiding numerical inaccuracy from compounded rounding errors).

In python this would be:

def makeAverager():
    dotProduct = 0
    totalWeight = 0

    def averager(newValue, weight):
        nonlocal dotProduct,totalWeight

        dotProduct += newValue*weight
        totalWeight += weight
        return dotProduct/totalWeight

    return averager

Demo:

>>> averager = makeAverager()
>>> [averager(value,w) for value,w in [(100,0.2), (50,0.5), (100,0.1)]]
[100.0, 64.28571428571429, 68.75]
>>> averager(10,1.1)
34.73684210526316
>>> averager(10,1.1)
25.666666666666668
>>> averager(30,2.0)
27.4

回答2:

> But my task is to have average recalculated each time new value arrives having old values reweighted. –OP

Your task is almost always impossible, even with exceptionally simple weighting schemes.

You are asking to, with O(1) memory, yield averages with a changing weighting scheme. For example, {values·weights1, (values+[newValue2])·weights2, (values+[newValue2,newValue3])·weights3, ...} as new values are being passed in, for some nearly arbitrarily changing weights sequence. This is impossible due to injectivity. Once you merge the numbers in together, you lose a massive amount of information. For example, even if you had the weight vector, you could not recover the original value vector, or vice versa. There are only two cases I can think of where you could get away with this:

Constant weights such as [2,2,2,...2]: this is equivalent to an on-line averaging algorithm, which you don't want because the old values are not being "reweighted".
The relative weights of previous answers do not change. For example you could do weights of [8,4,2,1], and add in a new element with arbitrary weight like ...+[1], but you must increase all the previous by the same multiplicative factor, like [16,8,4,2]+[1]. Thus at each step, you are adding a new arbitrary weight, and a new arbitrary rescaling of the past, so you have 2 degrees of freedom (only 1 if you need to keep your dot-product normalized). The weight-vectors you'd get would look like:

[w0]
[w0*(s1), w1]
[w0*(s1*s2), w1*(s2), w2]
[w0*(s1*s2*s3), w1*(s2*s3), w2*(s3), w3]
...

Thus any weighting scheme you can make look like that will work (unless you need to keep the thing normalized by the sum of weights, in which case you must then divide the new average by the new sum, which you can calculate by keeping only O(1) memory). Merely multiply the previous average by the new s (which will implicitly distribute over the dot-product into the weights), and tack on the new +w*newValue.

回答3:

I think you are looking for something like this:

void iterate(double value) {
    count++;

    weight = max(0, 1 - (count / 1000));

    avg = ( avg * total_weight * (count - 1)  + weight * value) / (total_weight * (count - 1) + weight)
    total_weight += weight;
}

回答4:

Here I'm assuming you want the weights to sum to 1. As long as you can generate a relative weight without it changing in the future, you can end up with a solution which mimics this behavior.

That is, suppose you defined your weights as a sequence {s_0, s_1, s_2, ..., s_n, ...} and defined the input as sequence {i_0, i_1, i_2, ..., i_n}.

Consider the form: sum(s_0*i_0 + s_1*i_1 + s_2*i_2 + ... + s_n*i_n) / sum(s_0 + s_1 + s_2 + ... + s_n). Note that it is trivially possible to compute this incrementally with a couple of aggregation counters:

int counter = 0;
double numerator = 0;
double denominator = 0;

void addValue(double val)
{
    double weight = calculateWeightFromCounter(counter);
    numerator += weight * val;
    denominator += weight;
}

double getAverage()
{
    if (denominator == 0.0) return 0.0;
    return numerator / denominator;
}

Of course, calculateWeightFromCounter() in this case shouldn't generate weights that sum to one -- the trick here is that we average by dividing by the sum of the weights so that in the end, the weights virtually seem to sum to one.

The real trick is how you do calculateWeightFromCounter(). You could simply return the counter itself, for example, however note that the last weighted number would not be near the sum of the counters necessarily, so you may not end up with the exact properties you want. (It's hard to say since, as mentioned, you've left a fairly open problem.)

回答5:

This is too long to post in a comment, but it may be useful to know.

Suppose you have: w_0*v_n + ... w_n*v_0 (we'll call this w[0..n]*v[n..0] for short)

Then the next step is: w_0*v_n1 + ... w_n1*v_0 (and this is w[0..n1]*v[n1..0] for short)

This means we need a way to calculate w[1..n1]*v[n..0] from w[0..n]*v[n..0].

It's certainly possible that v[n..0] is 0, ..., 0, z, 0, ..., 0 where z is at some location x.

If we don't have any 'extra' storage, then f(z*w(x))=z*w(x + 1) where w(x) is the weight for location x.

Rearranging the equation, w(x + 1) = f(z*w(x))/z. Well, w(x + 1) better be constant for a constant x, so f(z*w(x))/z better be constant. Hence, f must let z propagate -- that is, f(z*w(x)) = z*f(w(x)).

But here again we have an issue. Note that if z (which could be any number) can propagate through f, then w(x) certainly can. So f(z*w(x)) = w(x)*f(z). Thus f(w(x)) = w(x)/f(z). But for a constant x, w(x) is constant, and thus f(w(x)) better be constant, too. w(x) is constant, so f(z) better be constant so that w(x)/f(z) is constant. Thus f(w(x)) = w(x)/c where c is a constant.

So, f(x)=c*x where c is a constant when x is a weight value.

So w(x+1) = c*w(x).

That is, each weight is a multiple of the previous. Thus, the weights take the form w(x)=m*b^x.

Note that this assumes the only information f has is the last aggregated value. Note that at some point you will be reduced to this case unless you're willing to store a non-constant amount of data representing your input. You cannot represent an infinite length vector of real numbers with a real number, but you can approximate them somehow in a constant, finite amount of storage. But this would merely be an approximation.

Although I haven't rigorously proven it, it is my conclusion that what you want is impossible to do with a high degree of precision, but you may be able to use log(n) space (which may as well be O(1) for many practical applications) to generate a quality approximation. You may be able to use even less.

回答6:

I tried to practically code something (in Java). As has been said, your goal is not achievable. You can only count average from some number of last remembered values. If you don't need to be exact, you can approximate the older values. I tried to do it by remembering last 5 values exactly and older values only SUMmed by 5 values, remembering the last 5 SUMs. Then, the complexity is O(2n) for remembering last n+n*n values. This is a very rough approximation.

You can modify the "lastValues" and "lasAggregatedSums" array sizes as you want. See this ascii-art picture trying to display a graph of last values, showing that the first columns (older data) are remembered as aggregated value (not individually), and only the earliest 5 values are remembered individually.

values:
            #####
            #####       #####        #
      ##### #####       #####        #  #
      ##### ##### ##### #####       ## ##
      ##### ##### ##### ##### ##### #####
time: --->

Challenge 1: My example doesn't count weights, but I think it shouldn't be problem for you to add weights for the "lastAggregatedSums" appropriately - the only problem is, that if you want lower weights for older values, it would be harder, because the array is rotating, so it is not straightforward to know which weight for which array member. Maybe you can modify the algorithm to always "shift" values in the array instead of rotating? Then adding weights shouldn't be a problem.

Challenge 2: The arrays are initialized with 0 values, and those values are counting to the average from the beginning, even when we haven't receive enough values. If you are running the algorithm for long time, you probably don't bother that it is learning for some time at the beginning. If you do, you can post a modification ;-)

public class AverageCounter {
    private float[] lastValues = new float[5];
    private float[] lastAggregatedSums = new float[5];
    private int valIdx = 0;
    private int aggValIdx = 0;
    private float avg;

    public void add(float value) {
        lastValues[valIdx++] = value;
        if(valIdx == lastValues.length) {
            // count average of last values and save into the aggregated array.
            float sum = 0;
            for(float v: lastValues) {sum += v;}
            lastAggregatedSums[aggValIdx++] = sum;
            if(aggValIdx >= lastAggregatedSums.length) {
                // rotate aggregated values index
                aggValIdx = 0;
            }
            valIdx = 0;
        }
        float sum = 0;
        for(float v: lastValues) {sum += v;}
        for(float v: lastAggregatedSums) {sum += v;}
        avg = sum / (lastValues.length + lastAggregatedSums.length * lastValues.length);
    }

    public float getAvg() {
        return avg;
    }
}

回答7:

you can combine (weighted sum) exponential means with different effective window sizes (N) in order to get the desired weights. Use more exponential means to define your weight profile more detailed. (more exponential means also means to store and calculate more values, so here is the trade off)

来源：https://stackoverflow.com/questions/9915653/how-to-calculate-iteratively-the-running-weighted-average-so-that-last-values-to

标签

algorithm

iteration

weighted-average