问题
I have data like this:
library(data.table)
group <- c("a","a","a","b","b","b")
cond <- c("N","Y","N","Y","Y","N")
value <- c(2,1,3,4,2,5)
dt <- data.table(group, cond, value)
group cond value
a N 2
a Y 1
a N 3
b Y 4
b Y 2
b N 5
I would like to return max value when the cond is Y for the entire group. Something like this:
group cond value max
a N 2 1
a Y 1 1
a N 3 1
b Y 4 4
b Y 2 4
b N 5 4
I've tried adding an ifelse condition to a grouped max, however, I end up just returning the no condition of NA when the row doesn't meet the condition:
dt[, max := ifelse(cond=="Y", max(value), NA), by = group]
回答1:
Assuming that for each 'group' we need to get the max
of 'value' where the 'cond' is "Y", after grouping by 'group', subset the 'value' with the logical condition (cond == 'Y'
) and get the max
value
dt[, max := max(value[cond == 'Y']), by = group]
dt
# group cond value max
#1: a N 2 1
#2: a Y 1 1
#3: a N 3 1
#4: b Y 4 4
#5: b Y 2 4
#6: b N 5 4
回答2:
You could do...
dt[CJ(group = group, cond = "Y", unique=TRUE), on=.(group, cond),
.(mv = max(value))
, by=.EACHI]
# group cond mv
# 1: a Y 1
# 2: b Y 4
Using a join like this will eventually have optimization of the max
calculation.
Another way (originally included in @akrun's answer):
dt[cond == "Y", mv := max(value), by=group]
From the prior link, we can see that this way is already optimized, except for the := part.
来源:https://stackoverflow.com/questions/54911691/max-by-group-with-condition-for-a-data-table