I have a matrix in which each row is a sample from a distribution. I want to do a rolling comparison of the distributions using ks.test and save the test statis
I was able to compute the pairwise Kruskal-Wallis statistic using ks.test() with rollapplyr().
results <- rollapplyr(data = big,
width = 2,
FUN = function(x) ks.test(x[1, ], x[2, ])$statistic,
by.column = FALSE)
This gets the expected result, but it's slow for a dataset of your size. Slow slow slow. This may be because ks.test() is computing a lot more than just the statistic at each iteration; it also gets the p-value and does a lot of error checking.
Indeed, if we simulate a large dataset like so:
big <- NULL
for (i in 1:400) {
big <- cbind(big, rnorm(300000))
}
The rollapplyr() solution takes a long time; I halted execution after about 2 hours, at which point it had computed nearly all (but not all) results.
It seems that while rollapplyr() is likely faster than a for loop, it will not likely be the best overall solution in terms of performance.