问题
I have a DT with three columns, and the two first ones have various values grouped by Group.
ID ID_2 Group
23201600101793 2016052016051062331 A
23201600101793 2016062016061017838 A
23201600101794 2016052016051062331 A
23201600101794 2016052016051062402 A
23201600103090 2016052016051062325 A
23201600103090 2016052016051062408 A
23201600803366 2016052016051062325 A
23201600803366 2016052016051062408 A
I need to find a unique combination of both columns, withtout repeated values in any column. My desire output is for Group A is
ID ID_2 Group
23201600101793 2016052016051062331 A
23201600101794 2016052016051062402 A
23201600103090 2016052016051062325 A
23201600803366 2016052016051062408 A
Lines 3 and 7 were removed because they have repeated values in column ID_2 in lines 1 and 5, respectively. Lines 2, 4, 6 and 8 were removed because they repeat values from column ID in lines 1, 3, 5, 7.
There isn't a pattern by group, they can have many rows with the same ID or ID_2.
For example, from group B I just need 2 rows, since ID has two unique values. The selected rows can be the first ones (I mean, all the ID_2 rows but the first would be discarted since the first row has two unique values)
ID ID_2 Group
23201600009182 2016042016041000942 B
23201600009182 2016042016041000943 B
23201600009182 2016042016041000946 B
23201600009182 2016042016041000949 B
23201600009182 2016042016041000950 B
23201600009182 2016042016041000951 B
23201600009182 2016042016041000953 B
23201600009182 2016042016041000954 B
23201600009182 2016042016041000956 B
23201600009182 2016042016041000957 B
23201600009182 2016042016041000958 B
23201600669635 2016052016051003624 B
23201600669635 2016052016051003626 B
23201600669635 2016052016051003628 B
23201600669753 2016012016011000791 B
23201600669753 2016012016011000797 B
Desired output of Group B
23201600009182 2016042016041000942 B
23201600669635 2016052016051003624 B
I appreciate any help.
回答1:
From my understanding, you want Group
& ID
to be unique.
You can use distinict
in dplyr:
library(dplyr)
#sample data
set.seed(123)
sample_data <- tibble(ID = sample(1:4,size = 10,replace = T),
ID2 = sample(1:4,size = 10,replace = T),
group = sample(c("A","B"),size = 10,replace = T))
Sample data:
> sample_data
# A tibble: 10 x 3
ID ID2 group
<int> <int> <chr>
1 2 4 B
2 4 2 B
3 2 3 B
4 4 3 B
5 4 1 B
6 1 4 B
7 3 1 B
8 4 1 B
9 3 2 A
10 2 4 A
#sample result
distinct(sample_data,ID,group,.keep_all=T)
sample result:
# A tibble: 6 x 3
ID ID2 group
<int> <int> <chr>
1 2 4 B
2 4 2 B
3 1 4 B
4 3 1 B
5 3 2 A
6 2 4 A
来源:https://stackoverflow.com/questions/50820902/unique-combination-per-group