可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I want to write the same function to multiple columns using ddply function, but I'm tried keep writing them in one line, want to see is there better way of doing this?
Here's a simple version of the data:
data<-data.frame(TYPE=as.integer(runif(20,1,3)),A_MEAN_WEIGHT=runif(20,1,100),B_MEAN_WEIGHT=runif(20,1,10))
and I want to find out the sum of columns A_MEAN_WEIGHT and B_MEAN_WEIGHT by doing this:
ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT))
but in my current data I have more than 8 "*_MEAN_WEIGHT", and I'm tired of writing them 8 times like
ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT),MEAN_C=sum(C_MEAN_WEIGHT),MEAN_D=sum(D_MEAN_WEIGHT),MEAN_E=sum(E_MEAN_WEIGHT),MEAN_F=sum(F_MEAN_WEIGHT),MEAN_G=sum(G_MEAN_WEIGHT),MEAN_H=sum(H_MEAN_WEIGHT))
Is there a better way to write this? Thank you for your help!!
回答1:
The plyr
-centred approach is to use colwise
eg
ddply(data, .(TYPE), colwise(sum)) TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT 1 1 319.8977 60.80317 2 2 621.6745 37.05863
You can pass the column names as the argument .col
if you want only a subset
You can also use numcolwise
or catcolwise
to act on numeric or categorical columns only.
note that you could use sapply
in place of the most basic use of colwise
ddply(data, .(TYPE), sapply, FUN = 'mean')
The idiomatic data.table approach is to use lapply(.SD, fun)
eg
dt <- data.table(data) dt[,lapply(.SD, sum) ,by = TYPE] TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT 1: 2 621.6745 37.05863 2: 1 319.8977 60.80317
回答2:
Try this:
ddply(data, .(TYPE), colSums)
Here's a (slower) equivalent of the above, that can be tweaked to put any function instead of summing:
ddply(data, .(TYPE), function(x) {apply(x, 2, sum)})
And if you want to preserve the .(TYPE)
column, smth like this will do:
ddply(data, .(TYPE), function(x) {apply(x[,names(x) != "TYPE"], 2, sum)})
Better yet, use data.table
instead of plyr
:
library(data.table) dt = data.table(data) # just sums dt[, data.table(t(colSums(.SD))), by = TYPE] # sum for "A" and "B", and sqrt(sum) for "C" and "D" # note: you will have to call setnames() to fix the column names after dt[, data.table(t(colSums(.SD[, c("A_MEAN_WEIGHT", "B_MEAN_WEIGHT"), with = F])), t(apply(.SD[, c("C_MEAN_WEIGHT", "D_MEAN_WEIGHT"), with = F], 2, function(x) sqrt(sum(x))))), by = TYPE]