R - delete consecutive (ONLY) duplicates

后端 未结 4 587
长情又很酷
长情又很酷 2020-12-11 06:55

I need to eliminate rows from a data frame based on the repetition of values in a given column, but only those that are consecutive. For example, for the following data fram

4条回答
  •  南笙
    南笙 (楼主)
    2020-12-11 07:17

    A cheap solution with dplyr that I could think of:

    Method:

    library(dplyr)
    df %>% 
      mutate(id = lag(x, 1), 
             decision = if_else(x != id, 1, 0), 
             final = lead(decision, 1, default = 1)) %>% 
      filter(final == 1) %>% 
      select(-id, -decision, -final)
    

    Output:

      x  y z
    1 1 30 3
    2 2 49 5
    3 4 13 6
    4 2 49 8
    5 1 30 9
    

    This will even work if your data has the same x value at the bottom

    New Input:

    df2 <- df %>% add_row(x = 1, y = 10, z = 12)
    df2
    
       x  y  z
    1  1 10  1
    2  1 11  2
    3  1 30  3
    4  2 12  4
    5  2 49  5
    6  4 13  6
    7  2 12  7
    8  2 49  8
    9  1 30  9
    10 1 10 12
    

    Use same method:

    df2 %>% 
      mutate(id = lag(x, 1), 
             decision = if_else(x != id, 1, 0), 
             final = lead(decision, 1, default = 1)) %>% 
      filter(final == 1) %>% 
      select(-id, -decision, -final)
    

    New Output:

      x  y  z
    1 1 30  3
    2 2 49  5
    3 4 13  6
    4 2 49  8
    5 1 10 12
    

提交回复
热议问题