问题
I am currently running a randomization where individuals of a given population are sampled and placed into groups of defined size. The result is a data frame seen below:
Ind Group
Sally 1
Bob 1
Sue 1
Joe 2
Jeff 2
Jess 2
Mary 2
Jim 3
James 3
Is there a function which will allow me to expand the data set to show every possible within group pairing? (Desired output below). The pairings do not need to be reciprocal.
Group Ind1 Ind2
1 Sally Bob
1 Sally Sue
1 Sue Bob
2 Joe Jeff
2 Joe Jess
2 Joe Mary
2 Jeff Jess
2 Jess Mary
2 Jeff Mary
3 Jim James
I feel like there must be a way to do this in dplyr, but for the life of me I can't seem to sort it out.
回答1:
Here is an option using data.table
. Convert to data.table
(setDT(dt)
), Do a cross join (CJ
) grouped by 'Group' and remove the duplicated
elements
library(data.table)
setDT(dt)[, CJ(Ind1 = Ind, Ind2 = Ind, unique = TRUE)[Ind1 != Ind2],
Group][!duplicated(data.table(pmax(Ind1, Ind2), pmin(Ind1, Ind2)))]
# Group Ind1 Ind2
#1: 1 Bob Sally
#2: 1 Bob Sue
#3: 1 Sally Sue
#4: 2 Jeff Jess
#5: 2 Jeff Joe
#6: 2 Jeff Mary
#7: 2 Jess Joe
#8: 2 Jess Mary
#9: 2 Joe Mary
#10: 3 James Jim
Or using combn
by 'Group'
setDT(dt)[, {temp <- combn(Ind, 2); .(Ind1 = temp[1,], Ind2 = temp[2,])}, Group]
回答2:
An alternative dplyr
& tidyr
approach: The pipeline is a little longer, but the wrangling feels more straightforward to me. Start with combining all records in each group together. Next, pool and alphabetize all the names together to be able to eliminate the reciprocal/duplicates. Then finally separate the results back apart again.
left_join(dt, dt, by = "Group") %>%
filter(Ind.x != Ind.y) %>%
rowwise %>%
mutate(name = toString(sort(c(Ind.x,Ind.y)))) %>%
select(Group, name) %>%
distinct %>%
separate(name, into = c("Ind1", "Ind2")) %>%
arrange(Group, Ind1, Ind2)
- start off with a weak cross join of all records in each group
filter
out the self joins- collect up all the names in each row, sort them, and set them down together in the name column.
- now that the names are alphabetized, remove the alphabetized reciprocals
- pull the data apart back into separate columns.
# A tibble: 10 x 3 Group Ind1 Ind2 * <int> <chr> <chr> 1 1 Bob Sally 2 1 Sally Sue 3 1 Bob Sue 4 2 Jeff Joe 5 2 Jess Joe 6 2 Joe Mary 7 2 Jeff Jess 8 2 Jeff Mary 9 2 Jess Mary 10 3 James Jim
回答3:
A solution using dplyr
. We can use group_by
and do
to apply the combn
function to each group and combine the results to form a data frame.
library(dplyr)
dt2 <- dt %>%
group_by(Group) %>%
do(as_data_frame(t(combn(.$Ind, m = 2)))) %>%
ungroup() %>%
setNames(sub("V", "Ind", colnames(.)))
dt2
# # A tibble: 10 x 3
# Group Ind1 Ind2
# <int> <chr> <chr>
# 1 1 Sally Bob
# 2 1 Sally Sue
# 3 1 Bob Sue
# 4 2 Joe Jeff
# 5 2 Joe Jess
# 6 2 Joe Mary
# 7 2 Jeff Jess
# 8 2 Jeff Mary
# 9 2 Jess Mary
# 10 3 Jim James
DATA
dt <- read.table(text = "Ind Group
Sally 1
Bob 1
Sue 1
Joe 2
Jeff 2
Jess 2
Mary 2
Jim 3
James 3",
header = TRUE, stringsAsFactors = FALSE)
来源:https://stackoverflow.com/questions/47276418/expanding-a-list-to-include-all-possible-pairwise-combinations-within-a-group