Max by Group with Condition for a data.table

感情迁移 提交于 2021-01-28 14:01:06

问题


I have data like this:

library(data.table)
group <- c("a","a","a","b","b","b")
cond <- c("N","Y","N","Y","Y","N")
value <- c(2,1,3,4,2,5)

dt <- data.table(group, cond, value)

group cond value
a     N    2
a     Y    1
a     N    3
b     Y    4
b     Y    2
b     N    5

I would like to return max value when the cond is Y for the entire group. Something like this:

group cond value max
a     N    2     1
a     Y    1     1
a     N    3     1
b     Y    4     4
b     Y    2     4
b     N    5     4

I've tried adding an ifelse condition to a grouped max, however, I end up just returning the no condition of NA when the row doesn't meet the condition:

dt[, max := ifelse(cond=="Y", max(value), NA), by = group]

回答1:


Assuming that for each 'group' we need to get the max of 'value' where the 'cond' is "Y", after grouping by 'group', subset the 'value' with the logical condition (cond == 'Y') and get the max value

dt[, max := max(value[cond == 'Y']), by = group]
dt
#   group cond value max
#1:     a    N     2   1
#2:     a    Y     1   1
#3:     a    N     3   1
#4:     b    Y     4   4
#5:     b    Y     2   4
#6:     b    N     5   4



回答2:


You could do...

dt[CJ(group = group, cond = "Y", unique=TRUE), on=.(group, cond), 
  .(mv = max(value))
, by=.EACHI]

#    group cond mv
# 1:     a    Y  1
# 2:     b    Y  4

Using a join like this will eventually have optimization of the max calculation.


Another way (originally included in @akrun's answer):

dt[cond == "Y", mv := max(value), by=group]

From the prior link, we can see that this way is already optimized, except for the := part.



来源:https://stackoverflow.com/questions/54911691/max-by-group-with-condition-for-a-data-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!