R - delete consecutive (ONLY) duplicates

后端 未结 4 579
长情又很酷
长情又很酷 2020-12-11 06:55

I need to eliminate rows from a data frame based on the repetition of values in a given column, but only those that are consecutive. For example, for the following data fram

4条回答
  •  萌比男神i
    2020-12-11 07:14

    How about:

    df[cumsum(rle(df$x)$lengths),]
    

    Explanation:

    rle(df$x)
    

    gives you the run lengths and values of consecutive duplicates in the x variable. Then:

    rle(df$x)$lengths
    

    extracts the lengths. Finally:

    cumsum(rle(df$x)$lengths)
    

    gives the row indices which you can select using [.

    EDIT for fun here's a microbenchmark of the answers given so far with rle being mine, consec being what I think is the most fundamentally direct answer, given by @James, and would be the answer I would "accept", and dp being the dplyr answer given by @Nik.

    #> Unit: microseconds
    #>    expr       min         lq       mean     median         uq        max
    #>     rle   134.389   145.4220   162.6967   154.4180   172.8370    375.109
    #>  consec   111.411   118.9235   136.1893   123.6285   145.5765    314.249
    #>      dp 20478.898 20968.8010 23536.1306 21167.1200 22360.8605 179301.213
    

    rle performs better than I thought it would.

提交回复
热议问题