Joining the result of two statistical tables in one table in R

北战南征 提交于 2019-12-20 04:54:06

问题


In continuation of this issue comparison Mann-Whitney test between groups, I decided to create a new topic.

Solution of Rui Barradas helped me calculate Mann-Whitney for group 1-2 and 1-3.

lst <- split(mydat, mydat$group)
lapply(lst[-1], function(DF) wilcox.test(DF$var, lst[[1]]$var, exact = FALSE))

So now i want get the descriptive statistics. I use library:psych

describeBy(mydat$var,mydat$group)

So i get the following output

group: 1
   vars n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 4 23.5 0.58   23.5    23.5 0.74  23  24     1    0    -2.44 0.29
-------------------------------------------------------------------------------------- 
group: 2
   vars n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 4 23.5 0.58   23.5    23.5 0.74  23  24     1    0    -2.44 0.29
-------------------------------------------------------------------------------------- 
group: 3
   vars n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 4 23.5 0.58   23.5    23.5 0.74  23  24     1    0    -2.44 0.29

It is inconvenient. I need only for each group mean,sd,median and p-value of wilcox.test.

I.E. i want these output

       mean    sd     median      p-value
group1  23,5    0,58    23,5    -
group2  23,5    0,58    23,5    1
group3  23,5    0,58    23,5    1

How can i performe it?

Edit

structure(list(`1` = structure(list(vars = 1, n = 4, mean = 23.5, 
    sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
    min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
    se = 0.288675134594813), .Names = c("vars", "n", "mean", 
"sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
"kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
"data.frame")), `2` = structure(list(vars = 1, n = 4, mean = 23.5, 
    sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
    min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
    se = 0.288675134594813), .Names = c("vars", "n", "mean", 
"sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
"kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
"data.frame")), `3` = structure(list(vars = 1, n = 4, mean = 23.5, 
    sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
    min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
    se = 0.288675134594813), .Names = c("vars", "n", "mean", 
"sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
"kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
"data.frame"))), .Dim = 3L, .Dimnames = structure(list(group = c("1", 
"2", "3")), .Names = "group"), call = by.default(data = x, INDICES = group, 
    FUN = describe, type = type), class = c("psych", "describeBy"
))

回答1:


With the data posted in the linked to question and with the split instruction as above, the following will produce the desired output.

I repeat the tests in order to assign their results to wt_list.

wt_list <- lapply(lst[-1], function(DF) wilcox.test(DF$var, lst[[1]]$var, exact = FALSE))

mu <- tapply(mydat$var, mydat$group, mean)
s  <- tapply(mydat$var, mydat$group, sd)
md <- tapply(mydat$var, mydat$group, median)

pval <- c(NA, sapply(wt_list, '[[', "p.value"))

df_smry <- data.frame(mean = mu, sd = s, median = md, p.value = pval)

df_smry
#  mean        sd median p.value
#1 23.5 0.5773503   23.5      NA
#2 23.5 0.5773503   23.5       1
#3 23.5 0.5773503   23.5       1



回答2:


You can try a tidyverse together with broom. tidy() gives you the result of a test as a data.frame. We add missing group values using complete. Then we calculate the desriptive stats using dplyr's group_by and summarise_all and merge the result to the p.values. If necessary you can filter in the end.

library(tidyverse)
mydat %>% 
  with(.,pairwise.wilcox.test(var, group, exact =F)) %>% 
  broom::tidy() %>% 
  complete(group1 = factor(mydat$group)) %>% 
  left_join(mydat %>% 
              group_by(group=as.character(group)) %>% 
              summarise_all(c("mean", "sd", "median")), 
            by=c("group1"="group"))
# A tibble: 4 x 6
  group1 group2 p.value  mean    sd median
  <chr>  <chr>    <dbl> <dbl> <dbl>  <dbl>
1 1      NA          NA  23.5 0.577   23.5
2 2      1            1  23.5 0.577   23.5
3 3      1            1  23.5 0.577   23.5
4 3      2            1  23.5 0.577   23.5

Then you can filter for the expected output

.Last.value %>%   
  filter(!group2 %in% 2)
# A tibble: 3 x 6
  group1 group2 p.value  mean    sd median
  <chr>  <chr>    <dbl> <dbl> <dbl>  <dbl>
1 1      NA          NA  23.5 0.577   23.5
2 2      1            1  23.5 0.577   23.5
3 3      1            1  23.5 0.577   23.5



回答3:


Does this work for you? As mentioned there's a problem with your dput.

I had to use unlist for each group in order to use rbind, then a simple select from dplyr.

dat <- structure(list(`1` = structure(list(vars = 1, n = 4, mean = 23.5, 
                                           sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
                                           min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
                                           se = 0.288675134594813), .Names = c("vars", "n", "mean", 
                                                                               "sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
                                                                               "kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
                                                                                                                              "data.frame")), `2` = structure(list(vars = 1, n = 4, mean = 23.5, 
                                                                                                                                                                   sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
                                                                                                                                                                   min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
                                                                                                                                                                   se = 0.288675134594813), .Names = c("vars", "n", "mean", 
                                                                                                                                                                                                       "sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
                                                                                                                                                                                                       "kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
                                                                                                                                                                                                                                                      "data.frame")), `3` = structure(list(vars = 1, n = 4, mean = 23.5, 
                                                                                                                                                                                                                                                                                           sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
                                                                                                                                                                                                                                                                                           min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
                                                                                                                                                                                                                                                                                           se = 0.288675134594813), .Names = c("vars", "n", "mean", 
                                                                                                                                                                                                                                                                                                                               "sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
                                                                                                                                                                                                                                                                                                                               "kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
                                                                                                                                                                                                                                                                                                                                                                              "data.frame"))), .Dim = 3L, .Dimnames = structure(list(group = c("1", 
                                                                                                                                                                                                                                                                                                                                                                                                                                               "2", "3")), .Names = "group"), class = c("psych", "describeBy"
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ))

require(tidyverse)

rbind(unlist(dat[[1]]),unlist(dat[[2]]),unlist(dat[[3]])) %>% 
  as.data.frame() %>% 
  select(mean, sd, median)



回答4:


Another option to achieve this would be adding argument mat to describeBy

    describeBy(mydat$var, mydat$group, mat = TRUE)

   # So first I've used the data and the code form the link: 
    lst <- split(mydat, mydat$group)
    .ls <- lapply(lst[-1], function(DF) wilcox.test(DF$var, lst[[1]]$var, exact = FALSE))

    # Then I extracted values of p.values
    .ls <- c("-", sapply(.ls, '[[', "p.value"))

    # And finally I combined desired columns with extracted p.values
    cbind(describeBy(mydat$var, mydat$group, mat = TRUE)[c(5, 6, 7)], "p.value" =.ls)

   # And the output:

           mean        sd median p.value
        11 23.5 0.5773503   23.5       -
        12 23.5 0.5773503   23.5       1
        13 23.5 0.5773503   23.5       1


来源:https://stackoverflow.com/questions/51495838/joining-the-result-of-two-statistical-tables-in-one-table-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!