dplyr mutate with conditional values

后端 未结 3 2126
我寻月下人不归
我寻月下人不归 2020-11-27 10:34

In a large dataframe (\"myfile\") with four columns I have to add a fifth column with values conditionally based on the first four columns.

Prefer answers with

3条回答
  •  独厮守ぢ
    2020-11-27 10:54

    With dplyr 0.7.2, you can use the very useful case_when function :

    x=read.table(
     text="V1 V2 V3 V4
     1  1  2  3  5
     2  2  4  4  1
     3  1  4  1  1
     4  4  5  1  3
     5  5  5  5  4")
    x$V5 = case_when(x$V1==1 & x$V2!=4 ~ 1,
                     x$V2==4 & x$V3!=1 ~ 2,
                     TRUE ~ 0)
    

    Expressed with dplyr::mutate, it gives:

    x = x %>% mutate(
         V5 = case_when(
             V1==1 & V2!=4 ~ 1,
             V2==4 & V3!=1 ~ 2,
             TRUE ~ 0
         )
    )
    

    Please note that NA are not treated specially, as it can be misleading. The function will return NA only when no condition is matched. If you put a line with TRUE ~ ..., like I did in my example, the return value will then never be NA.

    Therefore, you have to expressively tell case_when to put NA where it belongs by adding a statement like is.na(x$V1) | is.na(x$V3) ~ NA_integer_. Hint: the dplyr::coalesce() function can be really useful here sometimes!

    Moreover, please note that NA alone will usually not work, you have to put special NA values : NA_integer_, NA_character_ or NA_real_.

提交回复
热议问题