Groupwise summary statistics for all dependent variables in R using dplyr

匿名 (未验证) 提交于 2019-12-03 09:10:12

问题:

I am trying to generate groupwise (hearing - my independent variable, so HL and NH are the two groups) summary statistics (mean, sd, min, max, standard error etc. ) for each of the 10 dependent variables. I was able to do this for one variable (R_PTA) using these 2 codes:

1.

RightPTA <- mydata %>% group_by(NHL) %>% summarise(n=length(R_PTA), mean_R_PTA=mean(R_PTA), sd_R_PTA=sd(R_PTA), se_R_PTA=sd(R_PTA)/sqrt(length(R_PTA)), min_R_PTA=min(R_PTA), max_R_PTA=max(R_PTA)) 

2.

mydata mean<-tapply(mydata$R_PTA, mydata$NHL, mean) mean sd<-tapply(mydata$R_PTA, mydata$NHL, sd) sd median<-tapply(mydata$R_PTA, mydata$NHL, median) median max<-tapply(mydata$R_PTA, mydata$NHL, max) max min<-tapply(mydata$R_PTA, mydata$NHL, min) min cbind(mean, sd, median, max, min) round(cbind(mean, sd, median, max, min), digits = 1) t1<-round(cbind(mean, sd, median, max, min), digits = 1) t1 

Here is the output:

RightearPTA    mean  sd median  max min HL 26.9 7.3   27.5 37.5 8.8 NH 11.6 4.1   12.5 16.2 2.5 

I want the same exact thing for all the remaining 9 variables (L_PTA, B_PTA etc.) but in one shot if possible. Is there no way to do this? Do I have to code for each single dependent variable? I am sure its out there, but I cant find it! Any hep would be appreciated!!

回答1:

Consider a base R solution with by (the object-oriented wrapper to tapply to subset dataframe into factor groups) and nested sapply (to build matrix of stats). Below demonstrates with random, seeded data for 10 stats columns:

set.seed(88)  df <- data.frame(   GROUP = sapply(seq(50), function(i) sample(c("NH", "HL"), 1, replace=TRUE)),   STAT1 = rnorm(50)*100,   STAT2 = rnorm(50),   STAT3 = runif(50)*100,   STAT4 = runif(50),   STAT5 = rgamma(50, shape = 2)*100,   STAT6 = rgamma(50, shape = 2),   STAT7 = rpois(50, lambda = 100)*100,   STAT8 = rpois(50, lambda = 100),   STAT9 = rexp(50, rate = 1)*100,   STAT10 = rexp(50, rate = 1) )  dfList <- by(df, df$GROUP, FUN = function(d)                 sapply(d[2:ncol(d)], function(i)                    c(mean = mean(i, na.rm=TRUE),                     sd = sd(i, na.rm=TRUE),                     median = median(i, na.rm=TRUE),                     min = min(i, na.rm=TRUE),                     max = max(i, na.rm=TRUE)                   )                 )             ) 

Output

dfList$HL  #              STAT1       STAT2     STAT3      STAT4     STAT5     STAT6      STAT7     STAT8      STAT9      STAT10 # mean     -6.594221 -0.04059519 52.990723 0.58753311 157.55220 1.9196911 10103.4483 101.17241 113.089148 0.771495372 # sd      102.512709  0.99159105 31.055376 0.27339871 152.37034 1.4880694   709.3673  10.02165 121.360898 0.720117072 # median    8.034055  0.01163562 56.416484 0.56894472 136.58274 1.5150241 10200.0000 103.00000  77.302150 0.599291434 # min    -199.786535 -1.84703449  1.345751 0.00207128  22.56936 0.1553518  8400.0000  82.00000   2.396641 0.006532798 # max     251.976970  2.55701655 98.612123 0.99413520 806.38484 7.1030277 11900.0000 120.00000 487.719745 3.133768953  dfList$NH  #             STAT1       STAT2      STAT3      STAT4    STAT5    STAT6      STAT7     STAT8      STAT9    STAT10 # mean     26.51853 -0.13748799 44.1973692 0.46621478 155.7555 1.880407  9961.9048 104.38095 150.596480 1.1243476 # sd       90.57645  0.77843518 29.9227560 0.30340507 121.5361 1.105004   868.6059   8.44083 131.123059 1.1627959 # median   24.52202 -0.02949522 46.1950960 0.33646282 114.7845 1.736198  9700.0000 105.00000 122.841835 0.7819896 # min    -105.54741 -1.58980314  0.2636007 0.02044767  17.3282 0.291350  8900.0000  89.00000   7.799051 0.1108107 # max     194.78958  1.35889041 96.0175463 0.99160167 434.5724 4.368176 12000.0000 120.00000 554.307036 5.1537741 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!