My simple question is: How do you do a ks.test between two data frames column by column?
Eg. We have two data frames:
D1 <- data.fra
A tidyverse solution using map function from the purrr package together with tidy function from the broom package
library(purrr)
library(broom)
# Data posted by @TUSHAr
set.seed(12)
D1 <- data.frame(A = rnorm(n = 30, mean = 5, sd = 2.5),
B = rnorm(n = 30, mean = 4.5, sd = 2.2),
C = rnorm(n = 30, mean = 2.5, sd = 12))
D2 <- data.frame(A = rnorm(n = 30, mean = 5, sd = 2.49),
B = rnorm(n = 30, mean = 4.4, sd = 2.2),
C = rnorm(n = 30, mean = 2, sd = 12))
# Loop through each column
result <- colnames(D1) %>%
set_names() %>%
# apply `ks.test` function for each column pair
map(~ ks.test(D1[, .x], D2[, .x])) %>%
# extract test results using `tidy` then bind them together by rows
map_dfr(., broom::tidy, .id = "parameter")
result
#> # A tibble: 3 x 5
#> parameter statistic p.value method alternative
#> <chr> <dbl> <dbl> <chr> <chr>
#> 1 A 0.167 0.808 Two-sample Kolmogorov-Smirnov t~ two-sided
#> 2 B 0.2 0.594 Two-sample Kolmogorov-Smirnov t~ two-sided
#> 3 C 0.233 0.393 Two-sample Kolmogorov-Smirnov t~ two-sided
Created on 2018-08-24 by the reprex package (v0.2.0.9000).
Created two data.frames D1 and D2 with some random numbers and same column names.
set.seed(12)
D1 = data.frame(A=rnorm(n = 30,mean = 5,sd = 2.5),B=rnorm(n = 30,mean = 4.5,sd = 2.2),C=rnorm(n = 30,mean = 2.5,sd = 12))
D2 = data.frame(A=rnorm(n = 30,mean = 5,sd = 2.49),B=rnorm(n = 30,mean = 4.4,sd = 2.2),C=rnorm(n = 30,mean = 2,sd = 12))
Now we can use the column names to loop through and pass it to D1 and D2 to perform the ks.test on the corresponding columns of the respective data.frames.
col.names = colnames(D1)
lapply(col.names,function(t,d1,d2){ks.test(d1[,t],d2[,t])},D1,D2)
#[[1]]
#Two-sample Kolmogorov-Smirnov test
#data: d1[, t] and d2[, t]
#D = 0.167, p-value = 0.81
#alternative hypothesis: two-sided
#[[2]]
#Two-sample Kolmogorov-Smirnov test
#data: d1[, t] and d2[, t]
#D = 0.233, p-value = 0.39
#alternative hypothesis: two-sided
#[[3]]
#Two-sample Kolmogorov-Smirnov test
#data: d1[, t] and d2[, t]
#D = 0.2, p-value = 0.59
#alternative hypothesis: two-sided
In the notation you have used in the question description, ideally the following code should work:
col.names =colnames(S)
lapply(col.names,function(t,d1,d2){ks.test(d1[,t],d2[,t])},D,S)