Removing Only Adjacent Duplicates in Data Frame in R

前端 未结 3 777
南笙
南笙 2021-01-12 21:20

I have a data frame in R that is supposed to have duplicates. However, there are some duplicates that I would need to remove. In particular, I only want to

3条回答
  •  旧时难觅i
    2021-01-12 21:24

    Here's an rle solution:

    df[cumsum(rle(as.character(df$x))$lengths), ]
    #    x  y
    # 1  A  1
    # 2  B  2
    # 3  C  3
    # 4  A  4
    # 5  B  5
    # 6  C  6
    # 7  A  7
    # 9  B  9
    # 10 C 10
    

    Explanation:

    RLE stands for Run Length Encoding. It produces a list of vectors. One being the runs, the values, and the other lengths being the number of consecutive repeats of each value. For example, x <- c(3, 2, 2, 3) has a runs vector of c(3, 2, 3) and lengths c(1, 2, 1). In this example, the cumulative sum of the lengths produces c(1, 3, 4). Subset x with this vector and you get c(3, 2, 3). Note that the second element of the lengths vector is the third element of the vector and the last occurrence of 2 in that particular 'run'.

提交回复
热议问题