How can I parallelize combn()?

别说谁变了你拦得住时间么 提交于 2019-12-07 07:01:45

问题


The function combn() generates all combinations of the elements of x taken m at a time. It is very fast and efficient for nCm small (where n is the number of elements of x) but it quickly runs out of memory. For example:

> combn(c(1:50), 12, simplify = TRUE)
Error in matrix(r, nrow = len.r, ncol = count) : 
invalid 'ncol' value (too large or NA)

I would like to know if the function combn() can be modified such that it generates only k chosen combinations. Let's call this new function chosencombn(). Then we would have:

> combn(c("a", "b", "c", "d"), m=2)
     [,1] [,2] [,3] [,4] [,5] [,6]
 [1,] "a"  "a"  "a"  "b"  "b"  "c" 
 [2,] "b"  "c"  "d"  "c"  "d"  "d" 

>chosencombn(c("a", "b", "c", "d"), m=2, i=c(1,4,6))
     [,1] [,2] [,3]
 [1,] "a"  "b"  "c" 
 [2,] "b"  "c"  "d"

>chosencombn(c("a", "b", "c", "d"), m=2, i=c(4,5))
     [,1] [,2]
 [1,] "b"  "b" 
 [2,] "c"  "d" 

I understand that such a function would require the use of an ordering of the combinations such that one can find the place of a given combination immediately. Does such an ordering exist? Can it be coded to obtain a function as efficient as combn()?


回答1:


Package "trotter" is useful for this as it does not keep the permutations in memory.

library(trotter)

combs = cpv(2, c("a", "b", "c", "d"))
sapply(c(1, 4, 6), function(i) combs[i])
#     [,1] [,2] [,3]
#[1,] "a"  "b"  "c" 
#[2,] "b"  "c"  "d"



回答2:


To get a sense of how combn orders its output, let's look at the output of combn(1:5, 3):

combn(1:5, 3)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    1    1    2    2    2     3
# [2,]    2    2    2    3    3    4    3    3    4     4
# [3,]    3    4    5    4    5    5    4    5    5     5

There is a lot of structure here. First, all columns are ordered as you go downward, and the first row is non-decreasing. The columns starting with 1 have combn(2:5, 2) below them; the columns starting with 2 have combn(3:5, 2) below them; and so on.

Let's now think of how to construct column number 8. The approach I would take to reconstruct would be to determine the first element of that column (due to the relationship above there are choose(4, 2)=6 columns starting with 1, choose(3, 2)=3 columns starting with 2, and choose(2, 2)=1 column starting with 3). In our case we determine that we start with 2, since columns 7-9 must start with 2.

To determine the second and subsequent elements of the column, we repeat the process with a smaller number of elements (since 2 is our first element, we're now selecting from elements 3-5), a new position (we're selecting column number 8-6=2 that begins with a 2), and a new number of remaining elements to select (we need 3-1=2 more elements).

getcombn below is an iterative formulation that does just this:

getcombn <- function(x, m, pos) {
  combo <- rep(NA, m)
  start <- 1
  for (i in seq_len(m-1)) {
    end.pos <- cumsum(choose((length(x)-start):(m-i), m-i))
    selection <- which.max(end.pos >= pos)
    start <- start + selection
    combo[i] <- x[start - 1]
    pos <- pos - c(0, end.pos)[selection]
  }
  combo[m] <- x[start + pos - 1]
  combo
}

chosencombn <- function(x, m, all.pos) {
  sapply(all.pos, function(pos) getcombn(x, m, pos))
}
chosencombn(c("a", "b", "c", "d"), 2, c(1,4,6))
#     [,1] [,2] [,3]
# [1,] "a"  "b"  "c" 
# [2,] "b"  "c"  "d" 
chosencombn(c("a", "b", "c", "d"), 2, c(4,5))
#     [,1] [,2]
# [1,] "b"  "b" 
# [2,] "c"  "d" 

This enables you to compute particular columns in cases where it would be impossible to enumerate all the combinations (you would run out of memory). For instance, with 50 options, the number of ways to select 25 elements is a 14-digit number, so enumerating all combinations is probably not an option. Still, you can compute specific indicated combinations:

chosencombn(1:50, 25, c(1, 1000000L, 1e14))
#       [,1] [,2] [,3]
#  [1,]    1    1    3
#  [2,]    2    2    4
#  [3,]    3    3    6
#  [4,]    4    4    7
#  [5,]    5    5    8
#  [6,]    6    6   11
#  [7,]    7    7   14
#  [8,]    8    8   15
#  [9,]    9    9   17
# [10,]   10   10   20
# [11,]   11   11   22
# [12,]   12   12   25
# [13,]   13   13   27
# [14,]   14   14   30
# [15,]   15   15   31
# [16,]   16   16   32
# [17,]   17   17   33
# [18,]   18   18   36
# [19,]   19   20   37
# [20,]   20   23   39
# [21,]   21   27   40
# [22,]   22   39   42
# [23,]   23   42   47
# [24,]   24   45   48
# [25,]   25   49   50


来源:https://stackoverflow.com/questions/35301986/how-can-i-parallelize-combn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!