Surprisingly Slow Standard Deviation in R

后端 未结 3 2090
死守一世寂寞
死守一世寂寞 2021-01-06 09:52

I am calculating standard deviations on an expanding window where at each point I recalculate the standard deviation. This seems like a fairly straightforward thing to do t

相关标签:
3条回答
  • 2021-01-06 09:59

    Edited to fix some typos, sorry.

    This takes ~1.3 seconds on my machine:

    t0 <- proc.time()[[3]]
    x <- rnorm(8000)
    z <- sapply(1000:8000,function(y){sd(x[seq_len(y)])})
    print(proc.time()[[3]]- t0)
    

    and I'd be willing to bet there are even faster ways of doing this. Avoid explicit for loops!

    0 讨论(0)
  • 2021-01-06 10:09

    When a somewhat similar question about a cumulative variance and a cumularive kurtosis operation came up in rhelp a few days ago, here is what I offered :

    daily <- rnorm(1000000)
    mbar <- mean(daily)
    cumvar <-  cumsum( (daily-cumsum(daily)/1:length(daily) )^2)
    cumskew <- cumsum( (daily-cumsum(daily)/1:length(daily))^3)/cumvar^(3/2)
    

    It's certainly faster than the sapply method but may be comparable to Aaron's.

     system.time( cumvar <-  cumsum( (daily-cumsum(daily)/1:length(daily) )^2) )
       user  system elapsed 
      0.037   0.026   0.061 
     system.time(cumsd <- sqrt(cumvar) )
       user  system elapsed 
      0.009   0.005   0.013 
    
    0 讨论(0)
  • 2021-01-06 10:20

    You might also try an algorithm that updates the standard deviation (well, actually, the sum of squares of differences from the mean) as you go. On my system this reduces the time from ~0.8 seconds to ~0.002 seconds.

    n <- length(x)
    m <- cumsum(x)/(1:n)
    m1 <- c(NA,m[1:(n-1)])
    ssd <- (x-m)*(x-m1)
    v <- c(0,cumsum(ssd[-1])/(1:(n-1)))
    z <- sqrt(v)
    

    See http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance for details.

    Also see the answers to this question: Efficient calculation of matrix cumulative standard deviation in r

    0 讨论(0)
提交回复
热议问题