I have this R function to generate a matrix of all combinations of k numbers between 0 and n whose sum equals n. This is one of the bottlenecks of my program as it becomes extre
Here's a different approach, which incrementally expands the set from size 1 to k, at each iteration pruning the combinations whose sums exceed n. This should result in speedups where you have a large k relative to n, because you won't need to compute anything close to the size of the power set.
sum.comb2 <- function(n, k) {
combos <- 0:n
sums <- 0:n
for (width in 2:k) {
combos <- apply(expand.grid(combos, 0:n), 1, paste, collapse=" ")
sums <- apply(expand.grid(sums, 0:n), 1, sum)
if (width == k) {
return(combos[sums == n])
} else {
combos <- combos[sums <= n]
sums <- sums[sums <= n]
}
}
}
# Simple test
sum.comb2(3, 2)
# [1] "3 0" "2 1" "1 2" "0 3"
Here's an example of the speedups with small n and large k:
library(microbenchmark)
microbenchmark(sum.comb2(1, 100))
# Unit: milliseconds
# expr min lq median uq max neval
# sum.comb2(1, 100) 149.0392 158.716 162.1919 174.0482 236.2095 100
This approach runs in under a second, while of course the approach with the power set would never get past the call to expand.grid
, since you'll end up with 2^100 rows in your resulting matrix.
Even in a less extreme case, with n=3 and k=10, we see a 20x speedup compared to function in the original post:
microbenchmark(sum.comb(3, 10), sum.comb2(3, 10))
# Unit: milliseconds
# expr min lq median uq max neval
# sum.comb(3, 10) 404.00895 439.94472 446.67452 461.24909 574.80426 100
# sum.comb2(3, 10) 23.27445 24.53771 25.60409 26.97439 65.59576 100