Equivalent to ddply(…,transform,…) in data.table

徘徊边缘 提交于 2019-12-22 05:19:30

问题


I have the following code using ddply from plyr package:

ddply(mtcars,.(cyl),transform,freq=length(cyl))

The data.table version of this is :

DT<-data.table(mtcars)

DT[,freq:=.N,by=cyl]

How can I extend this when I have more than one function like the one below?

Now, I want to perform more than one function on ddply and data.table:

ddply(mtcars,.(cyl),transform,freq=length(cyl),sum=sum(mpg))

DT[,list(freq=.N,sum=sum(mpg)),by=cyl] 

But, data.table gives me only three columns cyl,freq, and sum. Well, I can do like this:

DT[,list(freq=.N,sum=sum(mpg),mpg,disp,hp,drat,wt,qsec,vs,am,gear,carb),by=cyl]

But, I have large number of variables in my read data and I want all of them to be there as in ddply(...transform....). Is there shortcut in data.table just like doing := when we have only one function (as above) or something like this paste(names(mtcars),collapse=",") within data.table? Note: I also have a large number of function to run. So, I can't repeat =: a number of times (but I would prefer this if lapply can be applied here).


回答1:


Use backquoted := like this...

DT[ , `:=`( freq = .N , sum = sum(mpg) ) , by=cyl ]
head( DT , 3 )
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb freq   sum
#1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    7 138.2
#2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    7 138.2
#3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   11 293.3



回答2:


Also useful in some situations:

newvars <- c("freq","sum")
DT[, `:=`(eval(newvars), list(.N,sum(mpg)))]


来源:https://stackoverflow.com/questions/19569145/equivalent-to-ddply-transform-in-data-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!