Algorithm to calculate power set (all possible subsets) of a set in R

后端 未结 4 1486
故里飘歌
故里飘歌 2020-12-11 05:52

I couldn\'t find an answer to this anywhere, so here\'s my solution.

The question is: how can you calculate a power set in R?

It is possible to do this with

相关标签:
4条回答
  • 2020-12-11 06:00

    A subset can be seen as a boolean vector, indicating whether an element is in the subset of not. Those boolean vectors can be seen as numbers written in binary. Enumerating all the subsets of 1:n is therefore equivalent to enumerating the numbers from 0 to 2^n-1.

    f <- function(set) { 
      n <- length(set)
      masks <- 2^(1:n-1)
      lapply( 1:2^n-1, function(u) set[ bitwAnd(u, masks) != 0 ] )
    }
    f(LETTERS[1:4])
    
    0 讨论(0)
  • 2020-12-11 06:08

    The below should create a power set, minus the empty set element.

    powerset <- function(x) {
      sets <- lapply(1:(length(x)), function(i) combn(x, i, simplify = F))
      unlist(sets, recursive = F)
    }
    
    0 讨论(0)
  • 2020-12-11 06:17

    Here's another simple approach that seems to perform reasonably well for small sets. There are obvious memory issues with this method as the data cardinality increases.

    getPowSet <- function(set) {     
      n <- length(set)
      keepBool <- sapply(2^(1:n - 1), function(k) 
        rep(c(FALSE, TRUE), each=k, times=(2^n / (2*k))))
      lapply(1:2^n, function(j) set[keepBool[j, ]])
    }
    

    Speed comparison for n=10:

    microbenchmark(powerset(LETTERS[1:10]), f(LETTERS[1:10]), getPowSet(LETTERS[1:10]))
    
    Unit: milliseconds
                         expr      min       lq     mean   median       uq      max neval
      powerset(LETTERS[1:10]) 2.466167 2.551928 2.656964 2.581211 2.637358 3.676877   100
             f(LETTERS[1:10]) 2.923339 3.029928 3.115222 3.104413 3.175931 4.033136   100
     getPowSet(LETTERS[1:10]) 2.415290 2.490522 2.574131 2.547115 2.617198 3.664040   100
    

    But then for n=15 the original function seems to perform better:

    microbenchmark(powerset(LETTERS[1:15]), f(LETTERS[1:15]), getPowSet(LETTERS[1:15]))
    
    Unit: milliseconds
                         expr       min        lq      mean    median       uq      max neval
      powerset(LETTERS[1:15])  81.48276  88.50272  94.88927  91.61366  94.8262 174.0636   100
             f(LETTERS[1:15]) 110.86208 118.08736 124.38501 122.35872 126.7830 189.3131   100
     getPowSet(LETTERS[1:15])  86.16286  93.32314  98.14936  96.85443 100.6075 159.1030   100
    
    0 讨论(0)
  • 2020-12-11 06:24

    There is a function powerset in the package HapEstXXR which seems to be faster than your function and the function in the other answer. See below (your function is called your.powerset)

    require('microbenchmark')
    require('HapEstXXR')
    microbenchmark(powerset(LETTERS[1:10]), f(LETTERS[1:10]), your.powerset(LETTERS[1:10]), times=100)
    
    
    Unit: microseconds
                             expr      min        lq    median        uq       max neval
          powerset(LETTERS[1:10])  314.845  388.4225  594.2445  686.6455   857.513   100
                 f(LETTERS[1:10]) 7144.132 7937.6040 8222.1330 8568.5120 17805.335   100
     your.powerset(LETTERS[1:10]) 5016.981 5564.2880 5841.9810 6025.0690 29138.763   100
    

    Since powerset seems to be very fast you might want to look at the code of the function in the HapEstXXR package.

    0 讨论(0)
提交回复
热议问题