Can dplyr package be used for conditional mutating?

前端 未结 5 1899
庸人自扰
庸人自扰 2020-11-22 15:10

Can the mutate be used when the mutation is conditional (depending on the values of certain column values)?

This example helps showing what I mean.

s         


        
5条回答
  •  感动是毒
    2020-11-22 15:35

    Use ifelse

    df %>%
      mutate(g = ifelse(a == 2 | a == 5 | a == 7 | (a == 1 & b == 4), 2,
                   ifelse(a == 0 | a == 1 | a == 4 | a == 3 |  c == 4, 3, NA)))
    

    Added - if_else: Note that in dplyr 0.5 there is an if_else function defined so an alternative would be to replace ifelse with if_else; however, note that since if_else is stricter than ifelse (both legs of the condition must have the same type) so the NA in that case would have to be replaced with NA_real_ .

    df %>%
      mutate(g = if_else(a == 2 | a == 5 | a == 7 | (a == 1 & b == 4), 2,
                   if_else(a == 0 | a == 1 | a == 4 | a == 3 |  c == 4, 3, NA_real_)))
    

    Added - case_when Since this question was posted dplyr has added case_when so another alternative would be:

    df %>% mutate(g = case_when(a == 2 | a == 5 | a == 7 | (a == 1 & b == 4) ~ 2,
                                a == 0 | a == 1 | a == 4 | a == 3 |  c == 4 ~ 3,
                                TRUE ~ NA_real_))
    

    Added - arithmetic/na_if If the values are numeric and the conditions (except for the default value of NA at the end) are mutually exclusive, as is the case in the question, then we can use an arithmetic expression such that each term is multiplied by the desired result using na_if at the end to replace 0 with NA.

    df %>%
      mutate(g = 2 * (a == 2 | a == 5 | a == 7 | (a == 1 & b == 4)) +
                 3 * (a == 0 | a == 1 | a == 4 | a == 3 |  c == 4),
             g = na_if(g, 0))
    

提交回复
热议问题