Removing Only Adjacent Duplicates in Data Frame in R

前端未结

关注

 3  787

南笙 2021-01-12 21:20

I have a data frame in R that is supposed to have duplicates. However, there are some duplicates that I would need to remove. In particular, I only want to

3条回答

旧时难觅i (楼主)

2021-01-12 21:24
Here's an rle solution:
```
df[cumsum(rle(as.character(df$x))$lengths), ]
#    x  y
# 1  A  1
# 2  B  2
# 3  C  3
# 4  A  4
# 5  B  5
# 6  C  6
# 7  A  7
# 9  B  9
# 10 C 10
```
Explanation:

RLE stands for Run Length Encoding. It produces a list of vectors. One being the runs, the values, and the other lengths being the number of consecutive repeats of each value. For example, x <- c(3, 2, 2, 3) has a runs vector of c(3, 2, 3) and lengths c(1, 2, 1). In this example, the cumulative sum of the lengths produces c(1, 3, 4). Subset x with this vector and you get c(3, 2, 3). Note that the second element of the lengths vector is the third element of the vector and the last occurrence of 2 in that particular 'run'.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...