Remove grouping variable for data.table

无人久伴 提交于 2021-02-07 20:30:24

问题


I'd like to use data.table to do some wrangling and would like my resulting data table to not include the grouping variable.

Here's a MWE:

library("data.table")
DT <- data.table(x = 1:10, grp = rep(1:2,5))
DT[, .(mmm = mean(x)), by = grp]

This produces:

   grp mmm
1:   1   5
2:   2   6

which is all fine. However, I'd prefer the grp not to be here. This can be fixed by chaining the data.table calls and setting grp := NULL or just throwing the variable away, but can I prevent it in the first call so I only return mmm?


回答1:


It isn't clear why you don't want to use this. Using DT[, .(mmm = mean(x)), by = grp][, grp := NULL][] would be my first choice.

Although I won't advise it, you can also use:

DT[, .(mmm = DT[, .(mmm = mean(x)), by = grp]$mmm)]

which will give you the desired result as well:

   mmm
1:   5
2:   6

Although you will get the same result, it is better not to use this method. The major drawback of this is that you will make your code unnecessary complicated when you want to summarise more than value column. You would then get something like:

DT[, .(mx = DT[, .(mx = mean(x)), by = grp]$mx, my = DT[, .(my = mean(y)), by = grp]$my)]

while using the normal data.table-way would be:

DT[, .(mx = mean(x), my = mean(y)), by = grp][, grp := NULL][]

To conclude:

Using the DT[, .(mmm = mean(x)), by = grp][, grp := NULL][] method would thus be your best choice.



来源:https://stackoverflow.com/questions/47497386/remove-grouping-variable-for-data-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!