Efficiently perform row-wise distribution test

前端 未结 4 625
陌清茗
陌清茗 2021-01-05 02:43

I have a matrix in which each row is a sample from a distribution. I want to do a rolling comparison of the distributions using ks.test and save the test statis

4条回答
  •  感情败类
    2021-01-05 03:09

    One source of speed up is to write a smaller version of ks.test that does less. ks.test2 below is more restrictive than ks.test. It assumes, for example, that you have no missing values and that you always want the statistic associated with a two-sided test.

    ks.test2 <- function(x, y){
    
      n.x <- length(x)
      n.y <- length(y)
      w <- c(x, y)
      z <- cumsum(ifelse(order(w) <= n.x, 1/n.x, -1/n.y))
    
      max(abs(z))
    
    }
    

    Verify that the output is consistent with ks.test.

    set.seed(999)
    x <- rnorm(400)
    y <- rnorm(400)
    
    ks.test(x, y)$statistic
    
        D 
    0.045
    
    ks.test2(x, y)
    
    [1] 0.045
    

    Now determine the savings from the smaller function:

    library(microbenchmark)
    
    microbenchmark(
      ks.test(x, y),
      ks.test2(x, y)
      )
    
    Unit: microseconds
               expr      min       lq      mean   median        uq      max neval cld
      ks.test(x, y) 1030.238 1070.303 1347.3296 1227.207 1313.8490 6338.918   100   b
     ks.test2(x, y)  709.719  730.048  832.9532  833.861  888.5305 1281.284   100  a 
    

提交回复
热议问题