how to produce every permutation of 20 one (-1) in a 1-by-41 vector of ones and a simple calculation on each row?

无人久伴 提交于 2019-12-25 00:46:56

问题


I want to produce all permutations of 20 minus one(-1) and 21 one(1) this matrix has 269128937220 rows and 41 columns. and I want to do the following calculation on each row of this matrix:

(SLS')/4 

where:

S is each row of this matrix(a 1 by 41 array).

S' is the transpose of S(a 41 by 1 array).

L is a 41 by 41 matrix

the final result of each calculation is a single number.

is there any way to produce this matrix and do the calculation without getting out of memory error and in a reasonable time?

thanks in advance.


回答1:


First off, you are probably better off rethinking your approach. With that said, let's get started attacking your problem.

This is a very difficult problem mainly due to the limitations of resources. Below, I have a solution that will complete in a reasonable amount of time on a home computer given that you have access to a decent amount of storage (at least 7 TB). The algorithm below does not require that much memory and can be tuned to reduce memory usage.

Before we begin, we note that merely generating that many permutations seems impossible at first. However with the help of highly optimized C++ code and parallel computing, the task is brought back into the realm of possibility. This was demonstrated in my answer to the OP's previous question. We utilized RcppAlgos (I am the author) and the parallel package to generate about 36 million permutations per second in chunks of one million using 8 cores.

Now, we are charged with carrying out specific computations on each permutation as fast as possible. The computation is as follows:

(SLS') / 4, where S is a permutation, L is a 41 x 41 matrix

Here are a couple of base R approaches (N.B. m1[x, ] %*% m2 %*% m1[x, ] is the same as m1[x, ] %*% m2 %*% as.matrix(m1[x, ], ncol = 1)):

baseTest1 <- function(m1, m2) {
    vapply(1:nrow(m1), function(x) {
        m1[x, ] %*% m2 %*% m1[x, ]
    }, FUN.VALUE = 1.1111, USE.NAMES = FALSE) / 4
}

baseTest2 <- function(m1, m2) {
    temp <- m1 %*% m2
    vapply(1:nrow(m1), function(x) {
        crossprod(temp[x, ], m1[x, ])
    }, FUN.VALUE = 1.1111, USE.NAMES = FALSE) / 4
}

Let's think about this a little bit. We have a bunch of permutations of the numbers one and negative one. When we multiply these permutations by a matrix of real numbers, say M, we end up simply adding and subtracting values from the M. I bet we can speed this up quite a bit using Rcpp and avoid wasteful (and useless) identity multiplications (i.e. multiplications by 1).

#include <Rcpp.h>

//[[Rcpp::export]]
Rcpp::NumericVector makeVecCpp(Rcpp::NumericMatrix A, 
                               Rcpp::NumericMatrix B, 
                               unsigned long int mySize) {

    Rcpp::NumericVector result = Rcpp::no_init_vector(mySize);
    double temp = 0;

    for (std::size_t i = 0; i < mySize; ++i) {
        for (std::size_t j = 0; j < 41u; ++j) {
            for (std::size_t k = 0; k < 41u; ++k) {
                if (A(i, j) + A(i, k)) { 
                    temp += B(j, k);     
                } else {
                    temp -= B(j, k);
                }
            }
        }

        result[i] = temp / 4;
        temp = 0;
    }

    return result;
}

Now let's see if they give the same results and also benchmark them:

options(scipen = 999)
library(RcppAlgos)
library(microbenchmark)

set.seed(42)
M <- matrix(rnorm(41*41), nrow = 41, ncol = 41)

negOne <- permuteGeneral(c(1L, -1L), freqs = c(21, 20), upper = 100000)

all.equal(baseTest1(negOne, M), baseTest2(negOne, M))
# [1] TRUE
all.equal(baseTest1(negOne, M), makeVecCpp(negOne, M, 100000))
# [1] TRUE

microbenchmark(base1 = baseTest1(negOne, M), base2 = baseTest2(negOne, M), 
               myRcpp = makeVecCpp(negOne, M, 100000), times = 25)
Unit: milliseconds
  expr      min       lq     mean   median       uq      max neval
 base1 555.0256 582.2273 597.6447 593.7708 599.1380 690.3882    25
 base2 471.0251 494.2367 541.2632 531.1858 586.6774 632.7279    25
myRcpp 202.7637 207.2463 210.0255 209.0399 209.9648 240.6664    25

Our Rcpp implementation is the clear winner!! Moving on, we incorportate this into our final answer:

## WARNING Don't run this unless you have a few DAYS on your hand

library(parallel)
## break up into even intervals of one hundred thousand
firstPart <- mclapply(seq(1, 269128900000, 100000), function(x) {
    negOne <- permuteGeneral(c(1L, -1L), freqs = c(21, 20), 
                              lower = x, upper = x + 99999)
    vals <- makeVecCpp(negOne, M, 100000)
    write.csv(vals, paste0("myFile", x, ".csv", collapse = ""))
    x
}, mc.cores = 8)

## get the last few results and complete analysis
lastPart <- permuteGeneral(c(1L, -1L), freqs = c(21, 20), 
                           lower = 269128900001, upper = 269128937220)
vals <- makeVecCpp(lastPart, M, 37220)
write.csv(vals, paste0("myFile", 269128900001, ".csv", collapse = ""))

You will note that we avoid storing everything in memory by writing every one hundred thousand results to main storage hence the need for a huge hard drive. When I tested this, each file was about 2.5 Mb which would total to about 6.5 TB:

a <- 2.5 * (2^20) ### convert to bytes
a * (269128937220 / 1e5) / 2^40 ## get terabytes
[1] 6.416534

To give you an idea of how long this compuation will take, here is a timing for the first one hundred million results:

system.time(firstPart <- mclapply(seq(1, 100000000, 100000), function(x) {
    negOne <- permuteGeneral(c(1L, -1L), freqs = c(21, 20), 
                              lower = x, upper = x + 99999)
    vals <- makeVecCpp(negOne, M, 100000)
    write.csv(vals, paste0("myFile", x, ".csv", collapse = ""))
    x
}, mc.cores = 8))

   user  system elapsed 
529.931   9.557  80.690

80 seconds ain't that bad! That means we will only have to wait around for about 2.5 days!!!!!:

(269128937220 / 100000000 / 60 / 60 / 24) * 80
[1] 2.491935

If you really want to reduce this time, you will have to utilize a high performance computing service.

All results were obtained on a MacBook Pro 2.8GHz quad core (with 4 virtual cores.. 8 total).




回答2:


First note that the result you expect is a numerical vector with more than 269 billion elements. You will need 8 bytes per element, i.e. more than 2TB RAM just to store the result. If you don't have that much, it's hopeless to do what you ask for. Note also that you will need a long vector to store the result.

If you do have this amount of RAM, here is a solution based on combn with its FUN argument. This should be fairly optimal for the memory use. If you want to make it faster, try to implement compute_one directly with Rcpp.

k = 15 # should be 20
n = 2*k+1
L = matrix(runif(n*n), ncol=n)

compute_one = function(indices) {
    s = rep.int(1,n)
    s[indices] = -1
    drop(t(s) %*% L %*% s / 4)
}

res = combn(n, k, compute_one)


来源:https://stackoverflow.com/questions/51594408/how-to-produce-every-permutation-of-20-one-1-in-a-1-by-41-vector-of-ones-and

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!