I have a matrix in which each row is a sample from a distribution. I want to do a rolling comparison of the distributions using ks.test
and save the test statis
One source of speed up is to write a smaller version of ks.test
that does less. ks.test2
below is more restrictive than ks.test
. It assumes, for example, that you have no missing values and that you always want the statistic associated with a two-sided test.
ks.test2 <- function(x, y){
n.x <- length(x)
n.y <- length(y)
w <- c(x, y)
z <- cumsum(ifelse(order(w) <= n.x, 1/n.x, -1/n.y))
max(abs(z))
}
Verify that the output is consistent with ks.test
.
set.seed(999)
x <- rnorm(400)
y <- rnorm(400)
ks.test(x, y)$statistic
D
0.045
ks.test2(x, y)
[1] 0.045
Now determine the savings from the smaller function:
library(microbenchmark)
microbenchmark(
ks.test(x, y),
ks.test2(x, y)
)
Unit: microseconds
expr min lq mean median uq max neval cld
ks.test(x, y) 1030.238 1070.303 1347.3296 1227.207 1313.8490 6338.918 100 b
ks.test2(x, y) 709.719 730.048 832.9532 833.861 888.5305 1281.284 100 a