Computing Standard Deviation in a stream

后端 未结 3 737
梦毁少年i
梦毁少年i 2020-12-01 00:52

Using Python, assume I\'m running through a known quantity of items I, and have the ability to time how long it takes to process each one t, as wel

3条回答
  •  鱼传尺愫
    2020-12-01 01:30

    As outlined in the Wikipedia article on the standard deviation, it is enough to keep track of the following three sums:

    s0 = sum(1 for x in samples)
    s1 = sum(x for x in samples)
    s2 = sum(x*x for x in samples)
    

    These sums are easily updated as new values arrive. The standard deviation can be calculated as

    std_dev = math.sqrt((s0 * s2 - s1 * s1)/(s0 * (s0 - 1)))
    

    Note that this way of computing the standard deviation can be numerically ill-conditioned if your samples are floating point numbers and the standard deviation is small compared to the mean of the samples. If you expect samples of this type, you should resort to Welford's method (see the accepted answer).

提交回复
热议问题