replace duplicate values with NA in time series data using dplyr

前端 未结 2 1320
孤街浪徒
孤街浪徒 2020-12-19 13:58

My data seems a bit different than other similar kind of posts.

box_num      date       x        y
1-Q      2018-11-18   20.2      8
1-Q      2018-11-25   21         


        
2条回答
  •  一向
    一向 (楼主)
    2020-12-19 14:37

    Using dplyr we can group_by box_num and use mutate_at x and y column and replace the duplicated value by NA.

    library(dplyr)
    
    df %>%
      group_by(box_num) %>%
      mutate_at(vars(x:y), funs(replace(., duplicated(.), NA)))
    
    
    # box_num date          x     y
    #            
    #1 1-Q     2018-11-18 20.2    8   
    #2 1-Q     2018-11-25 21.2    7.2 
    #3 1-Q     2018-12-2  NA     23   
    #4 98-L    2018-11-25  0.134  9.3 
    #5 98-L    2018-12-2  NA      4   
    #6 76-GI   2018-12-2  22.7    4.56
    #7 76-GI   2018-12-9  28     NA  
    

    A base R option (which might not be the best in this case) would be :

    cols <- c("x", "y")
    df[cols] <- sapply(df[cols], function(x) 
                ave(x, df$box_num, FUN = function(x) replace(x, duplicated(x), NA)))
    

提交回复
热议问题