Generating a very large matrix of string combinations using combn() and bigmemory package

前端 未结 3 1799
星月不相逢
星月不相逢 2020-12-16 05:30

I have a vector x of 1,344 unique strings. I want to generate a matrix that gives me all possible groups of three values, regardless of order, and export that to a csv.

3条回答
  •  别那么骄傲
    2020-12-16 06:20

    Here's a function I've written in R, which currently finds its (unexported) home in the LSPM package. You give it the total number of items n, the number of items to select r, and the index of the combination you want i; it returns the values in 1:n corresponding to combination i.

    ".combinadic" <- function(n, r, i) {
    
      # http://msdn.microsoft.com/en-us/library/aa289166(VS.71).aspx
      # http://en.wikipedia.org/wiki/Combinadic
    
      if(i < 1 | i > choose(n,r)) stop("'i' must be 0 < i <= n!/(n-r)!")
    
      largestV <- function(n, r, i) {
        #v <- n-1
        v <- n                                  # Adjusted for one-based indexing
        #while(choose(v,r) > i) v <- v-1
        while(choose(v,r) >= i) v <- v-1        # Adjusted for one-based indexing
        return(v)
      }
    
      res <- rep(NA,r)
      for(j in 1:r) {
        res[j] <- largestV(n,r,i)
        i <- i-choose(res[j],r)
        n <- res[j]
        r <- r-1
      }
      res <- res + 1
      return(res)
    }
    

    It allows you to generate each combination based on the value of the lexicographic index:

    > .combinadic(1344, 3, 1)
    [1] 3 2 1
    > .combinadic(1344, 3, 2)
    [1] 4 2 1
    > .combinadic(1344, 3, 403716544)
    [1] 1344 1343 1342
    

    So you just need to loop over 1:403716544 and append the results to a file. It may take awhile, but it's at least feasible (see Dirk's answer). You also may need to do it in several loops, since the vector 1:403716544 will not fit in memory on my machine.

    Or you could just port the R code to C/C++ and do the looping / writing there, since it would be a lot faster.

提交回复
热议问题