Generating a very large matrix of string combinations using combn() and bigmemory package

前端 未结 3 1801
星月不相逢
星月不相逢 2020-12-16 05:30

I have a vector x of 1,344 unique strings. I want to generate a matrix that gives me all possible groups of three values, regardless of order, and export that to a csv.

3条回答
  •  长情又很酷
    2020-12-16 06:16

    You could first find all 2-way combinations, and then just combine them with the 3d value while saving them every time. This takes a lot less memory:

    combn.mod <- function(x,fname){
      tmp <- combn(x,2,simplify=F)
      n <- length(x)
      for ( i in x[-c(n,n-1)]){
        # Drop all combinations that contain value i
        id <- which(!unlist(lapply(tmp,function(t) i %in% t)))
        tmp <- tmp[id]
        # add i to all other combinations and write to file
        out <- do.call(rbind,lapply(tmp,c,i))
        write(t(out),file=fname,ncolumns=3,append=T,sep=",")
      }
    }
    
    combn.mod(x,"F:/Tmp/Test.txt")
    

    This is not as general as Joshua's answer though, it is specifically for your case. I guess it is faster -again, for this particular case-, but I didn't make the comparison. Function works on my computer using little over 50 Mb (roughly estimated) when applied to your x.

    EDIT

    On a sidenote: If this is for simulation purposes, I find it hard to believe that any scientific application needs 400+ million simulation runs. You might be asking the correct answer to the wrong question here...

    PROOF OF CONCEPT :

    I changed the write line by tt[[i]]<-out, added tt <- list() before the loop and return(tt) after it. Then:

    > do.call(rbind,combn.mod(letters[1:5]))
          [,1] [,2] [,3]
     [1,] "b"  "c"  "a" 
     [2,] "b"  "d"  "a" 
     [3,] "b"  "e"  "a" 
     [4,] "c"  "d"  "a" 
     [5,] "c"  "e"  "a" 
     [6,] "d"  "e"  "a" 
     [7,] "c"  "d"  "b" 
     [8,] "c"  "e"  "b" 
     [9,] "d"  "e"  "b" 
    [10,] "d"  "e"  "c" 
    

提交回复
热议问题