How can I parallelize combn()?

夙愿已清 提交于 2019-12-05 11:37:27

Package "trotter" is useful for this as it does not keep the permutations in memory.

library(trotter)

combs = cpv(2, c("a", "b", "c", "d"))
sapply(c(1, 4, 6), function(i) combs[i])
#     [,1] [,2] [,3]
#[1,] "a"  "b"  "c" 
#[2,] "b"  "c"  "d"

To get a sense of how combn orders its output, let's look at the output of combn(1:5, 3):

combn(1:5, 3)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    1    1    1    1    1    2    2    2     3
# [2,]    2    2    2    3    3    4    3    3    4     4
# [3,]    3    4    5    4    5    5    4    5    5     5

There is a lot of structure here. First, all columns are ordered as you go downward, and the first row is non-decreasing. The columns starting with 1 have combn(2:5, 2) below them; the columns starting with 2 have combn(3:5, 2) below them; and so on.

Let's now think of how to construct column number 8. The approach I would take to reconstruct would be to determine the first element of that column (due to the relationship above there are choose(4, 2)=6 columns starting with 1, choose(3, 2)=3 columns starting with 2, and choose(2, 2)=1 column starting with 3). In our case we determine that we start with 2, since columns 7-9 must start with 2.

To determine the second and subsequent elements of the column, we repeat the process with a smaller number of elements (since 2 is our first element, we're now selecting from elements 3-5), a new position (we're selecting column number 8-6=2 that begins with a 2), and a new number of remaining elements to select (we need 3-1=2 more elements).

getcombn below is an iterative formulation that does just this:

getcombn <- function(x, m, pos) {
  combo <- rep(NA, m)
  start <- 1
  for (i in seq_len(m-1)) {
    end.pos <- cumsum(choose((length(x)-start):(m-i), m-i))
    selection <- which.max(end.pos >= pos)
    start <- start + selection
    combo[i] <- x[start - 1]
    pos <- pos - c(0, end.pos)[selection]
  }
  combo[m] <- x[start + pos - 1]
  combo
}

chosencombn <- function(x, m, all.pos) {
  sapply(all.pos, function(pos) getcombn(x, m, pos))
}
chosencombn(c("a", "b", "c", "d"), 2, c(1,4,6))
#     [,1] [,2] [,3]
# [1,] "a"  "b"  "c" 
# [2,] "b"  "c"  "d" 
chosencombn(c("a", "b", "c", "d"), 2, c(4,5))
#     [,1] [,2]
# [1,] "b"  "b" 
# [2,] "c"  "d" 

This enables you to compute particular columns in cases where it would be impossible to enumerate all the combinations (you would run out of memory). For instance, with 50 options, the number of ways to select 25 elements is a 14-digit number, so enumerating all combinations is probably not an option. Still, you can compute specific indicated combinations:

chosencombn(1:50, 25, c(1, 1000000L, 1e14))
#       [,1] [,2] [,3]
#  [1,]    1    1    3
#  [2,]    2    2    4
#  [3,]    3    3    6
#  [4,]    4    4    7
#  [5,]    5    5    8
#  [6,]    6    6   11
#  [7,]    7    7   14
#  [8,]    8    8   15
#  [9,]    9    9   17
# [10,]   10   10   20
# [11,]   11   11   22
# [12,]   12   12   25
# [13,]   13   13   27
# [14,]   14   14   30
# [15,]   15   15   31
# [16,]   16   16   32
# [17,]   17   17   33
# [18,]   18   18   36
# [19,]   19   20   37
# [20,]   20   23   39
# [21,]   21   27   40
# [22,]   22   39   42
# [23,]   23   42   47
# [24,]   24   45   48
# [25,]   25   49   50
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!