Find unique pairings of entries in a character vector

问题

I have a vector fruit with three entries Peach, Plum, Pear. I would like to find each unique pairing in fruit and create a new, two column data.frame (e.g. df.new below). How might I do this in r for an even larger data.set? expand.grid results in pear-plum and plum-pear which are not unique pairings, or not the ones I am seeking. Any suggestions?

fruit <- c("Peach", "Plum", "Pear")

fruit1 <- c("Peach", "Peach", "Plum")
fruit2 <- c("Plum", "Pear", "Pear")
df.new <- data.frame(fruit1, fruit2)

#df.new
fruit1 fruit2
1  Peach   Plum
2  Peach   Pear
3   Plum   Pear

# attempt
fruit.y <- fruit
df.expand <- expand.grid(fruit,fruit.y)

回答1:

Using your initial strategy, you can still try expand grid:

fruit_df <- expand.grid(fruit,fruit)

Then sort each row by fruit and delete the duplicates:

fruit_df2 <- as.data.frame(unique(t(apply(fruit_df, 1, function(x) sort(x)))))

     V1    V2
1 Peach Peach
2 Peach  Plum
3 Peach  Pear
4  Plum  Plum
5  Pear  Plum
6  Pear  Pear

Another strategy is to generate all combination of pairs in fruit, try:

combn(fruit,2)

     [,1]    [,2]    [,3]  
[1,] "Peach" "Peach" "Plum"
[2,] "Plum"  "Pear"  "Pear"

Or to make your output as a data frame, transpose the results and recast:

as.data.frame(t(combn(fruit,2)))

Note that using combn you will not get the plum-plum.

来源：https://stackoverflow.com/questions/23024059/find-unique-pairings-of-entries-in-a-character-vector

标签

vector

dataframe

unique