问题
I have a data frame with one grouping factor (the first column) with multiple levels (more than two) and several columns with data. I want to apply the wilcox.test to the whole date frame to compare the each group variables with the others. How can I do this?
UPDATE: I know that the wilcox.test will only test for difference between two groups and my data frame contains three. But I am interested more in how to do this, than what test to use. Most likely that one group will be removed, but I have not decided yet on that, so I want to test all variants.
Here is a sample:
structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), var1 = c(9.3,
9.05, 7.78, 7.11, 7.14, 8.12, 7.5, 7.84, 7.8, 7.52, 8.84, 6.98,
6.1, 6.89, 6.5, 7.5, 7.8, 5.5, 6.61, 7.65, 7.68), var2 = c(11L,
11L, 10L, 1L, 3L, 7L, 11L, 11L, 11L, 11L, 4L, 1L, 1L, 1L, 2L,
2L, 1L, 4L, 8L, 8L, 1L), var3 = c(7L, 11L, 3L, 7L, 11L, 2L, 11L,
5L, 11L, 11L, 5L, 11L, 11L, 2L, 9L, 9L, 3L, 8L, 11L, 11L, 2L),
var4 = c(11L, 11L, 11L, 11L, 6L, 11L, 11L, 11L, 10L, 7L,
11L, 2L, 11L, 3L, 11L, 11L, 6L, 11L, 1L, 11L, 11L), var5 = c(11L,
1L, 2L, 2L, 11L, 11L, 1L, 10L, 2L, 11L, 1L, 3L, 11L, 11L,
8L, 8L, 11L, 11L, 11L, 2L, 9L)), .Names = c("group", "var1",
"var2", "var3", "var4", "var5"), class = "data.frame", row.names = c(NA,
-21L))
UPDATE
Thanks to everyone for all answers!
回答1:
Updating my answer to work across columns
test.fun <- function(dat, col) {
c1 <- combn(unique(dat$group),2)
sigs <- list()
for(i in 1:ncol(c1)) {
sigs[[i]] <- wilcox.test(
dat[dat$group == c1[1,i],col],
dat[dat$group == c1[2,i],col]
)
}
names(sigs) <- paste("Group",c1[1,],"by Group",c1[2,])
tests <- data.frame(Test=names(sigs),
W=unlist(lapply(sigs,function(x) x$statistic)),
p=unlist(lapply(sigs,function(x) x$p.value)),row.names=NULL)
return(tests)
}
tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x))
names(tests) <- colnames(dat)[-1]
# tests <- do.call(rbind, tests) reprints as data.frame
# This solution is not "slow" and outperforms the other answers significantly:
system.time(
rep(
tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x)),10000
)
)
# user system elapsed
# 0.056 0.000 0.053
And the result:
tests
$var1
Test W p
1 Group 1 by Group 2 28 0.36596737
2 Group 1 by Group 3 39 0.05927406
3 Group 2 by Group 3 38 0.27073136
$var2
Test W p
1 Group 1 by Group 2 19.0 0.8205958
2 Group 1 by Group 3 36.5 0.1159945
3 Group 2 by Group 3 40.5 0.1522726
$var3
Test W p
1 Group 1 by Group 2 13.0 0.2425786
2 Group 1 by Group 3 23.5 1.0000000
3 Group 2 by Group 3 41.0 0.1261647
$var4
Test W p
1 Group 1 by Group 2 26 0.4323470
2 Group 1 by Group 3 30 0.3729664
3 Group 2 by Group 3 29 0.9479518
$var5
Test W p
1 Group 1 by Group 2 24.0 0.7100968
2 Group 1 by Group 3 19.0 0.5324295
3 Group 2 by Group 3 17.5 0.2306609
回答2:
The pairwise.wilcox.test
function seems like it would be useful here; perhaps like this?
out <- lapply(2:6, function(x) pairwise.wilcox.test(d[[x]], d$group))
names(out) <- names(d)[2:6]
out
If you just want the p-values, you can go through and extract those and make a matrix.
sapply(out, function(x) {
p <- x$p.value
n <- outer(rownames(p), colnames(p), paste, sep='v')
p <- as.vector(p)
names(p) <- n
p
})
## var1 var2 var3 var4 var5
## 2v1 0.5414627 0.8205958 0.4851572 1 1.0000000
## 3v1 0.1778222 0.3479835 1.0000000 1 1.0000000
## 2v2 NA NA NA NA NA
## 3v2 0.5414627 0.3479835 0.3784941 1 0.6919826
Also note that pairwise.wilcox.test
adjusts for multiple comparisons using the Holm method; if you'd rather do something different, look at the p.adjust
parameter.
回答3:
You can loop over the columns using apply
and then pass the columns to whatever test you want to use using an anonymous function, like so (assuming the data frame is named df
):
apply(df[-1],2,function(x) kruskal.test(x,df$group))
Note: I used the Kruskal-Wallis test because that works on multiple groups. The above would work just as well using the Wilcoxon test if there were only two groups.
If you do want to do pairwise Wilcoxon tests on all variables, here's a two-liner that will loop through all columns and all pairs and return the results as a list:
group.pairs <- combn(unique(df$group),2,simplify=FALSE)
# this loops over the 2nd margin - the columns - of df and makes each column
# available as x
apply(df[-1], 2, function(x)
# this loops over the list of group pairs and makes each such pair
# available as an integer vector y
lapply(group.pairs, function(y)
wilcox.test(x[df$group %in% y],df$group[df$group %in% y])))
来源:https://stackoverflow.com/questions/21271449/how-to-apply-the-wilcox-test-to-a-whole-dataframe-in-r