Replace a subset of a data frame with dplyr join operations

前端 未结 4 484
栀梦
栀梦 2020-12-16 14:05

Suppose that I gave a treatment to some column values of a data frame like this:

  id animal weight   height ...
  1    dog     23.0
  2    cat     NA
  3           


        
4条回答
  •  北海茫月
    2020-12-16 14:30

    What you describe is a join operation in which you update some values in the original dataset. This is very easy to do with great performance using data.table because of its fast joins and update-by-reference concept (:=).

    Here's an example for your toy data:

    library(data.table)
    setDT(df)             # convert to data.table without copy
    setDT(sub_df)         # convert to data.table without copy
    
    # join and update "df" by reference, i.e. without copy 
    df[sub_df, on = c("id", "animal"), weight := i.weight]
    

    The data is now updated:

    #   id animal weight
    #1:  1    dog   23.0
    #2:  2    cat    2.2
    #3:  3   duck    1.2
    #4:  4  fairy    0.2
    #5:  5  snake    1.3
    

    You can use setDF to switch back to ordinary data.frame.

提交回复
热议问题