Efficiently perform row-wise distribution test

前端 未结 4 633
陌清茗
陌清茗 2021-01-05 02:43

I have a matrix in which each row is a sample from a distribution. I want to do a rolling comparison of the distributions using ks.test and save the test statis

4条回答
  •  醉酒成梦
    2021-01-05 03:15

    I was able to compute the pairwise Kruskal-Wallis statistic using ks.test() with rollapplyr().

    results <- rollapplyr(data = big,
                          width = 2,
                          FUN = function(x) ks.test(x[1, ], x[2, ])$statistic,
                          by.column = FALSE)
    

    This gets the expected result, but it's slow for a dataset of your size. Slow slow slow. This may be because ks.test() is computing a lot more than just the statistic at each iteration; it also gets the p-value and does a lot of error checking.

    Indeed, if we simulate a large dataset like so:

    big <- NULL
    for (i in 1:400) {
        big <- cbind(big, rnorm(300000))
    }
    

    The rollapplyr() solution takes a long time; I halted execution after about 2 hours, at which point it had computed nearly all (but not all) results.

    It seems that while rollapplyr() is likely faster than a for loop, it will not likely be the best overall solution in terms of performance.

提交回复
热议问题