Aggregating if each observation can belong to multiple groups with multiple grouping variables

佐手、 提交于 2019-12-08 11:58:27

问题


This question is a follow up of : Aggregating if each observation can belong to multiple groups.

As in the linked question my observations can belong to several groups. But now I got 2 grouping variables, which makes the problem much harder (at least to me). In the example below an observation can belong to one or more of the groups A, B, C. But I also want to distinguish according to another factor, i.e. is x < 1, x <.5 or y < 0. Since all x smaller 0 are also smaller 1 each observation can again belong to more than one group. I want to aggregate according to both groupings (A, B, C and x < 1, x <.5, y < 0) and get as result an aggregate of all combinations ((A and x < 1), (A and x < .5), ..., (C and x < 0). Let me know if the question is not clear enough and feel free to edit the title since I could not come up with a proper one.

# The data
library(data.table)
n <- 500
set.seed(1)
TF <- c(TRUE, FALSE)
time <- rep(1:4, each = n/4)


df <- data.table(time = time, x = rnorm(n), groupA = sample(TF, size = n, replace = TRUE),
                 groupB = sample(TF, size = n, replace = TRUE),
                 groupC = sample(TF, size = n, replace = TRUE))

df[ ,c("smaller1", "smaller.5", "smaller0") := .(x <= 1, x <= 0.5, x <= 0)]

# The result should look like this (a solution for wide format would be nice as well) but less repetitive
rbind(
df[smaller1 == TRUE , .(lapply(.SD*x, sum), c("A_smaller1", "B_smaller1", "C_smaller1")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller.5 == TRUE , .(lapply(.SD*x, sum), c("A_smaller.5", "B_smaller.5", "C_smaller.5")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller0 == TRUE , .(lapply(.SD*x, sum), c("A_smaller0", "B_smaller0", "C_smaller0")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")]
)

回答1:


First, you can melt and subset to those with group==TRUE. Next, use CJ (i.e. cross join) to create a list of all combinations. Then perform an non-equi join with original dataset and do a sum as follows:

mDT <- melt(df, id.vars=c("time", "x"))[(value)]
mDT[CJ(time=time, variable=variable, Level=seq(0,1,0.5), unique=TRUE), 
    sum(x.x), 
    by=.EACHI, 
    on=.(time, variable, x < Level)]


来源:https://stackoverflow.com/questions/50487229/aggregating-if-each-observation-can-belong-to-multiple-groups-with-multiple-grou

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!