Expanding a list to include all possible pairwise combinations within a group

问题

I am currently running a randomization where individuals of a given population are sampled and placed into groups of defined size. The result is a data frame seen below:

Ind Group
Sally   1
Bob 1
Sue 1
Joe 2
Jeff    2
Jess    2
Mary    2
Jim 3
James   3

Is there a function which will allow me to expand the data set to show every possible within group pairing? (Desired output below). The pairings do not need to be reciprocal.

Group   Ind1    Ind2
1   Sally   Bob
1   Sally   Sue
1   Sue Bob
2   Joe Jeff
2   Joe Jess
2   Joe Mary
2   Jeff    Jess
2   Jess    Mary
2   Jeff    Mary
3   Jim James

I feel like there must be a way to do this in dplyr, but for the life of me I can't seem to sort it out.

回答1:

Here is an option using data.table. Convert to data.table (setDT(dt)), Do a cross join (CJ) grouped by 'Group' and remove the duplicated elements

library(data.table)
setDT(dt)[, CJ(Ind1 = Ind, Ind2 = Ind, unique = TRUE)[Ind1 != Ind2], 
             Group][!duplicated(data.table(pmax(Ind1, Ind2), pmin(Ind1, Ind2)))]
#   Group  Ind1  Ind2
#1:     1   Bob Sally
#2:     1   Bob   Sue
#3:     1 Sally   Sue
#4:     2  Jeff  Jess
#5:     2  Jeff   Joe
#6:     2  Jeff  Mary
#7:     2  Jess   Joe
#8:     2  Jess  Mary
#9:     2   Joe  Mary
#10:    3 James   Jim

Or using combn by 'Group'

setDT(dt)[, {temp <- combn(Ind, 2); .(Ind1 = temp[1,], Ind2 = temp[2,])}, Group]

回答2:

An alternative dplyr & tidyr approach: The pipeline is a little longer, but the wrangling feels more straightforward to me. Start with combining all records in each group together. Next, pool and alphabetize all the names together to be able to eliminate the reciprocal/duplicates. Then finally separate the results back apart again.

left_join(dt, dt, by = "Group") %>% 
    filter(Ind.x != Ind.y) %>% 
    rowwise %>%
    mutate(name = toString(sort(c(Ind.x,Ind.y)))) %>% 
    select(Group, name) %>% 
    distinct %>% 
    separate(name, into = c("Ind1", "Ind2")) %>% 
    arrange(Group, Ind1, Ind2)

start off with a weak cross join of all records in each group
filter out the self joins
collect up all the names in each row, sort them, and set them down together in the name column.
now that the names are alphabetized, remove the alphabetized reciprocals
pull the data apart back into separate columns.

# A tibble: 10 x 3
   Group  Ind1  Ind2
 * <int> <chr> <chr>
 1     1   Bob Sally
 2     1 Sally   Sue
 3     1   Bob   Sue
 4     2  Jeff   Joe
 5     2  Jess   Joe
 6     2   Joe  Mary
 7     2  Jeff  Jess
 8     2  Jeff  Mary
 9     2  Jess  Mary
10     3 James   Jim

回答3:

A solution using dplyr. We can use group_by and do to apply the combn function to each group and combine the results to form a data frame.

library(dplyr)
dt2 <- dt %>%
  group_by(Group) %>%
  do(as_data_frame(t(combn(.$Ind, m = 2)))) %>%
  ungroup() %>%
  setNames(sub("V", "Ind", colnames(.)))
dt2

# # A tibble: 10 x 3
#    Group  Ind1  Ind2
#    <int> <chr> <chr>
#  1     1 Sally   Bob
#  2     1 Sally   Sue
#  3     1   Bob   Sue
#  4     2   Joe  Jeff
#  5     2   Joe  Jess
#  6     2   Joe  Mary
#  7     2  Jeff  Jess
#  8     2  Jeff  Mary
#  9     2  Jess  Mary
# 10     3   Jim James

DATA

dt <- read.table(text = "Ind Group
Sally   1
Bob 1
Sue 1
Joe 2
Jeff    2
Jess    2
Mary    2
Jim 3
James   3",
                 header = TRUE, stringsAsFactors = FALSE)

来源：https://stackoverflow.com/questions/47276418/expanding-a-list-to-include-all-possible-pairwise-combinations-within-a-group

标签

dplyr