R - delete consecutive (ONLY) duplicates

后端未结

关注

 4  586

长情又很酷 2020-12-11 06:55

I need to eliminate rows from a data frame based on the repetition of values in a given column, but only those that are consecutive. For example, for the following data fram

4条回答

萌比男神i (楼主)

2020-12-11 07:14
How about:
```
df[cumsum(rle(df$x)$lengths),]
```
Explanation:
```
rle(df$x)
```
gives you the run lengths and values of consecutive duplicates in the x variable. Then:
```
rle(df$x)$lengths
```
extracts the lengths. Finally:
```
cumsum(rle(df$x)$lengths)
```
gives the row indices which you can select using [.

EDIT for fun here's a microbenchmark of the answers given so far with rle being mine, consec being what I think is the most fundamentally direct answer, given by @James, and would be the answer I would "accept", and dp being the dplyr answer given by @Nik.
```
#> Unit: microseconds
#>    expr       min         lq       mean     median         uq        max
#>     rle   134.389   145.4220   162.6967   154.4180   172.8370    375.109
#>  consec   111.411   118.9235   136.1893   123.6285   145.5765    314.249
#>      dp 20478.898 20968.8010 23536.1306 21167.1200 22360.8605 179301.213
```
rle performs better than I thought it would.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...