Efficient calculation of matrix cumulative standard deviation in r

前端 未结 2 647
一个人的身影
一个人的身影 2020-12-01 19:45

I recently posted this question on the r-help mailing list but got no answers, so I thought I would post it here as well and see if there were any suggestions.

I am

相关标签:
2条回答
  • 2020-12-01 20:31

    Another try (Marek's is faster)

    cumsd2 <- function(y) {
    n <- nrow(y)
    apply(y,2,function(i) {
        Xmeans <- lapply(1:n,function(z) rep(sum(i[1:z])/z,z))
        Xs <- sapply(1:n, function(z) i[1:z])
        sapply(2:n,function(z) sqrt(sum((Xs[[z]]-Xmeans[[z]])^2,na.rm = T)/(z-1)))
    })
    }
    
    0 讨论(0)
  • 2020-12-01 20:44

    You could use cumsum to compute necessary sums from direct formulas for variance/sd to vectorized operations on matrix:

    cumsd_mod <- function(mat) {
        cum_var <- function(x) {
            ind_na <- !is.na(x)
            nn <- cumsum(ind_na)
            x[!ind_na] <- 0
            cumsum(x^2) / (nn-1) - (cumsum(x))^2/(nn-1)/nn
        }
        v <- sqrt(apply(mat,2,cum_var))
        v[is.na(mat) | is.infinite(v)] <- NA
        v
    }
    

    just for comparison:

    set.seed(2765374)
    X <- matrix(rnorm(1000),100,10)
    X[cbind(1:10,1:10)] <- NA # to have some NA's
    
    all.equal(cumsd(X),cumsd_mod(X))
    # [1] TRUE
    

    And about timing:

    X <- matrix(rnorm(100000),1000,100)
    system.time(cumsd(X))
    # user  system elapsed 
    # 7.94    0.00    7.97 
    system.time(cumsd_mod(X))
    # user  system elapsed 
    # 0.03    0.00    0.03 
    
    0 讨论(0)
提交回复
热议问题