How to apply the wilcox.test to a whole dataframe in R?

杀马特。学长 韩版系。学妹 提交于 2019-12-01 02:11:59

问题


I have a data frame with one grouping factor (the first column) with multiple levels (more than two) and several columns with data. I want to apply the wilcox.test to the whole date frame to compare the each group variables with the others. How can I do this?

UPDATE: I know that the wilcox.test will only test for difference between two groups and my data frame contains three. But I am interested more in how to do this, than what test to use. Most likely that one group will be removed, but I have not decided yet on that, so I want to test all variants.

Here is a sample:

structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), var1 = c(9.3, 
9.05, 7.78, 7.11, 7.14, 8.12, 7.5, 7.84, 7.8, 7.52, 8.84, 6.98, 
6.1, 6.89, 6.5, 7.5, 7.8, 5.5, 6.61, 7.65, 7.68), var2 = c(11L, 
11L, 10L, 1L, 3L, 7L, 11L, 11L, 11L, 11L, 4L, 1L, 1L, 1L, 2L, 
2L, 1L, 4L, 8L, 8L, 1L), var3 = c(7L, 11L, 3L, 7L, 11L, 2L, 11L, 
5L, 11L, 11L, 5L, 11L, 11L, 2L, 9L, 9L, 3L, 8L, 11L, 11L, 2L), 
    var4 = c(11L, 11L, 11L, 11L, 6L, 11L, 11L, 11L, 10L, 7L, 
    11L, 2L, 11L, 3L, 11L, 11L, 6L, 11L, 1L, 11L, 11L), var5 = c(11L, 
    1L, 2L, 2L, 11L, 11L, 1L, 10L, 2L, 11L, 1L, 3L, 11L, 11L, 
    8L, 8L, 11L, 11L, 11L, 2L, 9L)), .Names = c("group", "var1", 
"var2", "var3", "var4", "var5"), class = "data.frame", row.names = c(NA, 
-21L))

UPDATE

Thanks to everyone for all answers!


回答1:


Updating my answer to work across columns

test.fun <- function(dat, col) { 

 c1 <- combn(unique(dat$group),2)
 sigs <- list()
 for(i in 1:ncol(c1)) {
    sigs[[i]] <- wilcox.test(
                   dat[dat$group == c1[1,i],col],
                   dat[dat$group == c1[2,i],col]
                 )
    }
    names(sigs) <- paste("Group",c1[1,],"by Group",c1[2,])

 tests <- data.frame(Test=names(sigs),
                    W=unlist(lapply(sigs,function(x) x$statistic)),
                    p=unlist(lapply(sigs,function(x) x$p.value)),row.names=NULL)

 return(tests)
}


tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x))
names(tests) <- colnames(dat)[-1]
# tests <- do.call(rbind, tests) reprints as data.frame

# This solution is not "slow" and outperforms the other answers significantly: 
system.time(
  rep(
   tests <- lapply(colnames(dat)[-1],function(x) test.fun(dat,x)),10000
  )
)

#   user  system elapsed 
#  0.056   0.000   0.053 

And the result:

tests

$var1
                Test  W          p
1 Group 1 by Group 2 28 0.36596737
2 Group 1 by Group 3 39 0.05927406
3 Group 2 by Group 3 38 0.27073136

$var2
                Test    W         p
1 Group 1 by Group 2 19.0 0.8205958
2 Group 1 by Group 3 36.5 0.1159945
3 Group 2 by Group 3 40.5 0.1522726

$var3
                Test    W         p
1 Group 1 by Group 2 13.0 0.2425786
2 Group 1 by Group 3 23.5 1.0000000
3 Group 2 by Group 3 41.0 0.1261647

$var4
                Test  W         p
1 Group 1 by Group 2 26 0.4323470
2 Group 1 by Group 3 30 0.3729664
3 Group 2 by Group 3 29 0.9479518

$var5
                Test    W         p
1 Group 1 by Group 2 24.0 0.7100968
2 Group 1 by Group 3 19.0 0.5324295
3 Group 2 by Group 3 17.5 0.2306609



回答2:


The pairwise.wilcox.test function seems like it would be useful here; perhaps like this?

out <- lapply(2:6, function(x) pairwise.wilcox.test(d[[x]], d$group))
names(out) <- names(d)[2:6]
out

If you just want the p-values, you can go through and extract those and make a matrix.

sapply(out, function(x) {
    p <- x$p.value
    n <- outer(rownames(p), colnames(p), paste, sep='v')
    p <- as.vector(p)
    names(p) <- n
    p
})
##         var1      var2      var3 var4      var5
## 2v1 0.5414627 0.8205958 0.4851572    1 1.0000000
## 3v1 0.1778222 0.3479835 1.0000000    1 1.0000000
## 2v2        NA        NA        NA   NA        NA
## 3v2 0.5414627 0.3479835 0.3784941    1 0.6919826

Also note that pairwise.wilcox.test adjusts for multiple comparisons using the Holm method; if you'd rather do something different, look at the p.adjust parameter.




回答3:


You can loop over the columns using apply and then pass the columns to whatever test you want to use using an anonymous function, like so (assuming the data frame is named df):

apply(df[-1],2,function(x) kruskal.test(x,df$group))

Note: I used the Kruskal-Wallis test because that works on multiple groups. The above would work just as well using the Wilcoxon test if there were only two groups.

If you do want to do pairwise Wilcoxon tests on all variables, here's a two-liner that will loop through all columns and all pairs and return the results as a list:

group.pairs <- combn(unique(df$group),2,simplify=FALSE)
# this loops over the 2nd margin - the columns - of df and makes each column
# available as x
apply(df[-1], 2, function(x)
             # this loops over the list of group pairs and makes each such pair
             # available as an integer vector y
             lapply(group.pairs, function(y)
                    wilcox.test(x[df$group %in% y],df$group[df$group %in% y])))


来源:https://stackoverflow.com/questions/21271449/how-to-apply-the-wilcox-test-to-a-whole-dataframe-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!