问题
In R, I'm looking to remove any instances after the first two b
and c
after each a
(please note the numbering).
I've got the following:
1 a
2 b
3 c
4 a
5 b
6 c
7 a
8 b
9 c
10 b
11 c
12 a
13 b
14 c
15 c
I'm looking to reduce it to:
1 a
2 b
3 c
4 a
5 b
6 c
7 a
8 b
9 c
12 a
13 b
14 c
I'm trying to do this within a dplyr
pipe if possible.
Any ideas?
回答1:
One possible solution:
df = read.table(text="1 a
2 b
3 c
4 a
5 b
6 c
7 a
8 b
9 c
10 b
11 c
12 a
13 b
14 c
15 c",header=F)
library(dplyr)
df %>% mutate(x=cumsum(V2=='a')) %>%
group_by(x) %>%
filter(!duplicated(V2)) %>%
ungroup() %>%
select(-x)
Output:
# A tibble: 12 x 2
V1 V2
<int> <fctr>
1 1 a
2 2 b
3 3 c
4 4 a
5 5 b
6 6 c
7 7 a
8 8 b
9 9 c
10 12 a
11 13 b
12 14 c
Note that this removes all duplicated elements every time after an a
is encountered. If you only want to remove duplicated b
's and c
's, consider : filter(!(duplicated(V2) & (V2=='b' | V2=='c')))
回答2:
How about this?
d <- data.frame(lets = c("a", "b", "c", "a", "b", "c", "a", "b", "c", "b", "c", "a", "b", "c", "c"))
d %>%
mutate(lag1 = lag(lets),
lag2 = lag(lag1)) %>%
filter(is.na(lag2) |
!(lets == lag1 | lets == lag2 | lag1 == lag2)) %>%
select(lets)
lets
1 a
2 b
3 c
4 a
5 b
6 c
7 a
8 b
9 c
10 a
11 b
12 c
来源:https://stackoverflow.com/questions/48528734/remove-duplicates-based-on-order