plyr package writing the same function over multiple columns

一个人想着一个人 提交于 2019-12-06 02:07:41

问题


I want to write the same function to multiple columns using ddply function, but I'm tried keep writing them in one line, want to see is there better way of doing this?

Here's a simple version of the data:

data<-data.frame(TYPE=as.integer(runif(20,1,3)),A_MEAN_WEIGHT=runif(20,1,100),B_MEAN_WEIGHT=runif(20,1,10))

and I want to find out the sum of columns A_MEAN_WEIGHT and B_MEAN_WEIGHT by doing this:

ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT))

but in my current data I have more than 8 "*_MEAN_WEIGHT", and I'm tired of writing them 8 times like

ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT),MEAN_C=sum(C_MEAN_WEIGHT),MEAN_D=sum(D_MEAN_WEIGHT),MEAN_E=sum(E_MEAN_WEIGHT),MEAN_F=sum(F_MEAN_WEIGHT),MEAN_G=sum(G_MEAN_WEIGHT),MEAN_H=sum(H_MEAN_WEIGHT))

Is there a better way to write this? Thank you for your help!!


回答1:


The plyr-centred approach is to use colwise

eg

 ddply(data, .(TYPE), colwise(sum))
  TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1    1      319.8977      60.80317
2    2      621.6745      37.05863

You can pass the column names as the argument .col if you want only a subset

You can also use numcolwise or catcolwise to act on numeric or categorical columns only.

note that you could use sapply in place of the most basic use of colwise

ddply(data, .(TYPE), sapply, FUN = 'mean') 

The idiomatic data.table approach is to use lapply(.SD, fun)

eg

dt <- data.table(data)
dt[,lapply(.SD, sum) ,by = TYPE]
   TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1:    2      621.6745      37.05863
2:    1      319.8977      60.80317



回答2:


Try this:

ddply(data, .(TYPE), colSums)

Here's a (slower) equivalent of the above, that can be tweaked to put any function instead of summing:

ddply(data, .(TYPE), function(x) {apply(x, 2, sum)})

And if you want to preserve the .(TYPE) column, smth like this will do:

ddply(data, .(TYPE), function(x) {apply(x[,names(x) != "TYPE"], 2, sum)})

Better yet, use data.table instead of plyr:

library(data.table)
dt = data.table(data)

# just sums
dt[, data.table(t(colSums(.SD))), by = TYPE]

# sum for "A" and "B", and sqrt(sum) for "C" and "D"
# note: you will have to call setnames() to fix the column names after
dt[, data.table(t(colSums(.SD[, c("A_MEAN_WEIGHT", "B_MEAN_WEIGHT"), with = F])),
                t(apply(.SD[, c("C_MEAN_WEIGHT", "D_MEAN_WEIGHT"), with = F],
                        2, function(x) sqrt(sum(x))))),
     by = TYPE]


来源:https://stackoverflow.com/questions/16090532/plyr-package-writing-the-same-function-over-multiple-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!