ddply to multiple columns equivalent in data.table

心不动则不痛 提交于 2019-12-21 04:31:37

问题


I am a big fan of the data.table package and I am having trouble converting some code in ddply of the plyr package into the equivalent in a data.table. The code for ddply is:

dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54),
  age2 = runif(n = 29, min = 18, max = 54)
)

ddply(dfx, .(group, sex), numcolwise(sum))

What I want to do is sum across multiple columns without having to manually specify the column names. The manual equivalent in the data.table package is:

dfx.dt = data.table(dfx)
dfx.dt[ , sum.age := sum(age), by="group,sex"]
dfx.dt[ , sum.age2 := sum(age2), by="group,sex"]
dfx.dt[!duplicated(dfx.dt[ , {list(group, sex)}]), ]

To be explicit, my question is "is there a way to do the equivalent of the ddply code in data.table?"

Any help is greatly appreciated, thanks.


回答1:


Yes, there's a way:

dfx.dt[,lapply(.SD,sum),by='group,sex']

This is mentioned in section 2.1 of the FAQ for data.table.



来源:https://stackoverflow.com/questions/18876783/ddply-to-multiple-columns-equivalent-in-data-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!