How to create summaries of subgroups based on factors in R

陌路散爱 提交于 2019-12-11 08:35:38

问题


I want to calculate the mean for each numeric variable in the following example. These need to be grouped by each factor associated with "id" and by each factor associated with"status".

set.seed(10)
dfex <- 
data.frame(id=c("2","1","1","1","3","2","3"),status=c("hit","miss","miss","hit","miss","miss","miss"),var3=rnorm(7),var4=rnorm(7),var5=rnorm(7),var6=rnorm(7))

For the means of "id" groups, the first row of output would be labeled "mean-id-1". Rows labeled "mean-id-2" and "mean-id-3" would follow. For the means of "status" groups, the rows would be labeled "mean-status-miss" and "mean-status-hit". My objective is to generate these means and their row labels programatically.

I've tried many different permutations of apply functions, but each has issues. I've also experimented with the aggregate function.


回答1:


With base R the following works for the "id" column:

means_id <- aggregate(dfex[,grep("var",names(dfex))],list(dfex$id),mean)
rownames(means_id) <- paste0("mean-id-",means_id$Group.1)
means_id$Group.1 <- NULL

Output:

                var3       var4       var5       var6
mean-id-1 -0.7182503 -0.2604572 -0.3535823 -1.3530417
mean-id-2  0.2042702 -0.3009548  0.6121843 -1.4364211
mean-id-3 -0.4567655  0.8716131  0.1646053 -0.6229102

The same for the "status" column:

means_status <- aggregate(dfex[,grep("var",names(dfex))],list(dfex$status),mean)
rownames(means_status) <- paste0("mean-status-",means_status$Group.1)
means_status$Group.1 <- NULL



回答2:


Probably the fastest way to do this will be with data.table (for big data sets), although I didn't find a way to present new row names in data.table object, thus I converted it back to data.frame

library(data.table)
setDT(dfex) # convert `dfex` to a `data.table` object
#setkey(dfex, id) # This is not necessary, only if you want to sort your table by "id" column first
dat1 <- as.data.frame(dfex[,-2, with = F][, lapply(.SD, mean), by = id])
rownames(dat1) <- paste0("mean-id-", as.character(dat1[,"id"]))
dat2 <- as.data.frame(dfex[,-1, with = F][, lapply(.SD, mean), by = status])
rownames(dat2) <- paste0("mean-status-", as.character(dat2[,"status"]))



回答3:


You could do:

do.call(rbind,by(dfex[,-(1:2)], paste("mean-id",dfex[,1],sep="-"), colMeans)) 
              var3       var4       var5       var6
mean-id-1 -0.7383944  0.5005763 -0.4777325  0.6988741
mean-id-2 -0.0316267 -0.1764453  0.1313834  0.6867287
mean-id-3  0.7489377  0.8091953  0.9290247 -0.1263163

Create both result as a list:

 lapply(c("id","status"), function(x) do.call(rbind,by(dfex[grep("var",names(dfex))], paste("mean-id",dfex[,x],sep="-"), colMeans)))

Update:

library(matrixStats)
 lapply(c("id","status"), function(x) do.call(rbind,by(dfex[grep("var",names(dfex))], paste("mean-id",dfex[,x],sep="-"), colSds)))
 [[1]]
              var3       var4      var5      var6
 mean-id-1 0.6024318 1.36423044 0.5398717 0.7260939
 mean-id-2 0.2623706 0.08870122 0.1827246 1.0590560
 mean-id-3 1.0625137 0.16381062 1.0760977 0.3524908

[[2]]
                  var3     var4      var5      var6
mean-id-hit  0.4369311 1.036234 0.6622341 0.6506010
mean-id-miss 0.8288436 1.035163 0.7688912 0.6799636


来源:https://stackoverflow.com/questions/24198629/how-to-create-summaries-of-subgroups-based-on-factors-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!