问题
Suppose that there are three variables in my data frame (mydata): 1) id, 2) case, and 3) value.
mydata <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), case=c("a","b","c","c","b","a","b","c","c","a","b","c","c","a","b","c","a"), value=c(1,34,56,23,34,546,34,67,23,65,23,65,23,87,34,321,87))
mydata
id case value
1 1 a 1
2 1 b 34
3 1 c 56
4 1 c 23
5 1 b 34
6 2 a 546
7 2 b 34
8 2 c 67
9 2 c 23
10 3 a 65
11 3 b 23
12 3 c 65
13 3 c 23
14 4 a 87
15 4 b 34
16 4 c 321
17 4 a 87
For each id, we could have similar ‘case’ characters, and their values could be the same or different. So basically, if their values are the same, I only need to keep one and remove the duplicate.
My final data then would be
id case value
1 1 a 1
2 1 b 34
3 1 c 56
4 1 c 23
5 2 a 546
6 2 b 34
7 2 c 67
8 2 c 23
9 3 a 65
10 3 b 23
11 3 c 65
12 3 c 23
13 4 a 87
14 4 b 34
15 4 c 321
回答1:
You could try duplicated
mydata[!duplicated(mydata[,c('id', 'case', 'value')]),]
# id case value
#1 1 a 1
#2 1 b 34
#3 1 c 56
#4 1 c 23
#6 2 a 546
#7 2 b 34
#8 2 c 67
#9 2 c 23
#10 3 a 65
#11 3 b 23
#12 3 c 65
#13 3 c 23
#14 4 a 87
#15 4 b 34
#16 4 c 321
Or use unique
with by
option from data.table
library(data.table)
set.seed(25)
mydata1 <- cbind(mydata, value1=rnorm(17))
DT <- as.data.table(mydata1)
unique(DT, by=c('id', 'case', 'value'))
# id case value value1
#1: 1 a 1 -0.21183360
#2: 1 b 34 -1.04159113
#3: 1 c 56 -1.15330756
#4: 1 c 23 0.32153150
#5: 2 a 546 -0.44553326
#6: 2 b 34 1.73404543
#7: 2 c 67 0.51129562
#8: 2 c 23 0.09964504
#9: 3 a 65 -0.05789111
#10: 3 b 23 -1.74278763
#11: 3 c 65 -1.32495298
#12: 3 c 23 -0.54793388
#13: 4 a 87 -1.45638428
#14: 4 b 34 0.08268682
#15: 4 c 321 0.92757895
回答2:
To add to the other answers, here's a dplyr approach:
library(dplyr)
mydata %>% group_by(id, case, value) %>% distinct()
Or
mydata %>% distinct(id, case, value)
回答3:
Case and value only? Easy:
> mydata[!duplicated(mydata[,c("id","case","value")]),]
Even if you have a ton more variables in the dataset, they won't be considered by the duplicated()
call.
来源:https://stackoverflow.com/questions/27255065/removing-duplicates-for-each-id