all combinations of k numbers between 0 and n whose sum equals n, speed optimization

后端未结

关注

 5  1918

Happy的楠姐 2021-01-25 18:42

I have this R function to generate a matrix of all combinations of k numbers between 0 and n whose sum equals n. This is one of the bottlenecks of my program as it becomes extre

5条回答

自闭症患者 (楼主)

2021-01-25 19:21

Here's a different approach, which incrementally expands the set from size 1 to k, at each iteration pruning the combinations whose sums exceed n. This should result in speedups where you have a large k relative to n, because you won't need to compute anything close to the size of the power set.

sum.comb2 <- function(n, k) {
  combos <- 0:n
  sums <- 0:n
  for (width in 2:k) {
    combos <- apply(expand.grid(combos, 0:n), 1, paste, collapse=" ")
    sums <- apply(expand.grid(sums, 0:n), 1, sum)
    if (width == k) {
      return(combos[sums == n])
    } else {
      combos <- combos[sums <= n]
      sums <- sums[sums <= n]
    }
  }
}

# Simple test
sum.comb2(3, 2)
# [1] "3 0" "2 1" "1 2" "0 3"

Here's an example of the speedups with small n and large k:

library(microbenchmark)
microbenchmark(sum.comb2(1, 100))
# Unit: milliseconds
#               expr      min      lq   median       uq      max neval
#  sum.comb2(1, 100) 149.0392 158.716 162.1919 174.0482 236.2095   100

This approach runs in under a second, while of course the approach with the power set would never get past the call to expand.grid, since you'll end up with 2^100 rows in your resulting matrix.

Even in a less extreme case, with n=3 and k=10, we see a 20x speedup compared to function in the original post:

microbenchmark(sum.comb(3, 10), sum.comb2(3, 10))
# Unit: milliseconds
#              expr       min        lq    median        uq       max neval
#   sum.comb(3, 10) 404.00895 439.94472 446.67452 461.24909 574.80426   100
#  sum.comb2(3, 10)  23.27445  24.53771  25.60409  26.97439  65.59576   100

0 讨论(0)

查看其它5个回答