问题
I have a similar data set to the following:
A B C
1 10 5
1 20 1
2 30 1
2 30 1
I'd like to add a column returning 1, until we hit a duplicate of A & B, when I need to return a 0, but only for the second instance, so:
A B C D
1 10 5 1
1 20 1 1
2 30 1 1
2 30 1 0
Any help appreciated.
回答1:
An option would be
df$D <- as.integer(!duplicated(df[c("A", "B")]))
df$D
#[1] 1 1 1 0
回答2:
Just a doodle with library(dplyr)
:
df %>% group_by(A,B) %>% mutate(D = +((1:n())==1))
Or if you want it to be zero "only for the second instance", meaning the third instance would be also one, then the following works:
df %>% group_by(A,B) %>% mutate(D = +!((1:n())==2))
In the example your duplicates are not for A
and B
only but also C
. If that's actually the case, you can use group_by_all
instead of group_by(A,B)
.
来源:https://stackoverflow.com/questions/56366368/return-0-to-second-instance-of-duplicate