问题
This question is a follow up of : Aggregating if each observation can belong to multiple groups.
As in the linked question my observations can belong to several groups. But now I got 2 grouping variables, which makes the problem much harder (at least to me). In the example below an observation can belong to one or more of the groups A, B, C. But I also want to distinguish according to another factor, i.e. is x < 1, x <.5 or y < 0. Since all x smaller 0 are also smaller 1 each observation can again belong to more than one group. I want to aggregate according to both groupings (A, B, C and x < 1, x <.5, y < 0) and get as result an aggregate of all combinations ((A and x < 1), (A and x < .5), ..., (C and x < 0). Let me know if the question is not clear enough and feel free to edit the title since I could not come up with a proper one.
# The data
library(data.table)
n <- 500
set.seed(1)
TF <- c(TRUE, FALSE)
time <- rep(1:4, each = n/4)
df <- data.table(time = time, x = rnorm(n), groupA = sample(TF, size = n, replace = TRUE),
groupB = sample(TF, size = n, replace = TRUE),
groupC = sample(TF, size = n, replace = TRUE))
df[ ,c("smaller1", "smaller.5", "smaller0") := .(x <= 1, x <= 0.5, x <= 0)]
# The result should look like this (a solution for wide format would be nice as well) but less repetitive
rbind(
df[smaller1 == TRUE , .(lapply(.SD*x, sum), c("A_smaller1", "B_smaller1", "C_smaller1")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller.5 == TRUE , .(lapply(.SD*x, sum), c("A_smaller.5", "B_smaller.5", "C_smaller.5")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")],
df[smaller0 == TRUE , .(lapply(.SD*x, sum), c("A_smaller0", "B_smaller0", "C_smaller0")), by=.(time),.SDcols = c("groupA", "groupB", "groupC")]
)
回答1:
First, you can melt and subset to those with group==TRUE. Next, use CJ
(i.e. cross join) to create a list of all combinations. Then perform an non-equi join with original dataset and do a sum as follows:
mDT <- melt(df, id.vars=c("time", "x"))[(value)]
mDT[CJ(time=time, variable=variable, Level=seq(0,1,0.5), unique=TRUE),
sum(x.x),
by=.EACHI,
on=.(time, variable, x < Level)]
来源:https://stackoverflow.com/questions/50487229/aggregating-if-each-observation-can-belong-to-multiple-groups-with-multiple-grou