Couple the data in all possible combinations

大城市里の小女人 提交于 2019-12-25 09:15:50

问题


I have data in column in two columns like this

Id  Value
1   a
2   f
1   c
1   h
2   a

and I'd like couple the data of the 'Value' column in all possible combinations based on the same Id such as

(a,c)
(a,h)
(c,h)
(f,a)

Is there any R or Python or VBA code to get this task?


回答1:


To return a character matrix with these combinations using base R, try

do.call(rbind, t(sapply(split(df, df$Id), function(i) t(combn(i$Value, 2)))))
     [,1] [,2]
[1,] "a"  "c" 
[2,] "a"  "h" 
[3,] "c"  "h" 
[4,] "f"  "a"

Each row is a desired combination.

To break this down a bit, split splits the data.frame by Id into a list of two data.frames. Then sapply is fed this list and the combn function to find the pairwise combinations within these data.frames. The result from each data.frame (which is a matrix) is transposed to fit your desired structure using t. Finally, this list of matrices is fed to do.call which uses rbind to return the final matrix.

Note: There is an assumption that the value column is character (not the pesky factor variable type). This is easily accomplished in the read. family of functions, like read.csv and read.table by adding the as.is=TRUE argument to your read function (or the longer stringsAsFactors=FALSE). If the variable is already a factor, you can wrap the i$Value statement near the end in as.character: as.character(i$Value) and it will run as desired.




回答2:


Using R you could try:

library(purrr)

df %>%
  split(.$Id) %>%
  map(~ t(combn(.$Value, 2)))

Which gives:

#$`1`
#     [,1] [,2]
#[1,] "a"  "c" 
#[2,] "a"  "h" 
#[3,] "c"  "h" 
#
#$`2`
#     [,1] [,2]
#[1,] f    a   
#Levels: a c f h



回答3:


Just another way (possibly slightly faster as it exploits the fact that you're looking for all pairs, and avoids combn and t):

require(data.table)
dt[, .( c1 = rep(Value, (.N:1)-1L), c2 = rep(Value, (1:.N)-1L) ), by=Id]
#    Id c1 c2
# 1:  1  a  c
# 2:  1  a  h
# 3:  1  c  h
# 4:  2  f  a

.N contains the number of observations for each group.


where dt is:

dt = fread('Id  Value
1   a
2   f
1   c
1   h
2   a')


来源:https://stackoverflow.com/questions/38333533/couple-the-data-in-all-possible-combinations

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!