Is there a limit on working with matrix in R with Rcpp?

岁酱吖の 提交于 2019-12-02 12:59:07

问题


I was trying to develop a program in R to estimate a Spearman correlation with Rcpp. I did it, but it only works with matrix with less of a range between 45 00 - 50 000 vectors. I don't know why, but it only works with that dimension. I suppose there's limit with that type of information, maybe if I work it like a data.frame? I would really appreciate if someone gives me insight.

Here i post my code. Ive been trying to limit the max integer number that i call "denominador", which exceeds it. Maybe you could help me.

cppFunction('double spearman(NumericMatrix x){
 int nrow = x.nrow(), ncol = x.ncol();
 int nrow1 = nrow - 1;
 double out = 0;
 double cont = 0;
 double cont1 = 0;
 double r = 0;
 int denominador = ncol*(pow(ncol,2.0)-1)

 for(int i = 0; i < nrow1; i++){
 #Here i use every combination of vectors starting with the first one, and so on
  for(int j = i +1; j < nrow; j++){
   cont1 = 0;
   for(int t = 0; t < ncol; t++){
    cont = pow(x(i,t)-x(j,t), 2.0);
    cont1 += cont;
   }
   #Here i begin to store the mean correlation, in order to a final mean of all the possible correlations
   r = 2*(1-6*(cont1/denominador))/(nrow*nrow1);
   out += r;
  }
 }
 return out;
}')

回答1:


To repeat more succintly:

  1. You can have more than 2^31-1 elements in a vector.

  2. Matrices are vectors with dim attributes.

  3. You can have more than 2^31-1 elements in a matrix (ie n times k)

  4. Your row and column index are still limited to 2^31.

Example of a big vector:

R> n <- .Machine$integer.max + 100
R> tmpVec <- 1:n
R> length(tmpVec)
[1] 2147483747
R> newVec <- sqrt(tmpVec)
R> 



回答2:


A couple caveats

Before we get started, I'm assuming:

  • R > 3.0.0
    • Long Vectors that allow for 2 ^ 52 elements are then supported
  • Rcpp > 0.12.0
    • Patch where thirdwing replaced instances of int and size_t with R_xlen_t and R_xlength. See release post for more details...

Constructing a large NumericMatrix

I think you may be running into a memory allocation issue...

As the following works on my 32gb machine:

Rcpp::cppFunction("NumericMatrix make_matrix(){
                   NumericMatrix m(50000, 50000);
                   return m;
                  }")

m = make_matrix()

object.size(m)

## 20000000200 bytes # about 20.0000002 gb

Running:

# Creates an 18.6gb matrix!!!
m = matrix(0, ncol = 50000, nrow = 50000)

Rcpp::cppFunction("void get_length(NumericMatrix m){
                   Rcout << m.nrow() << ' ' << m.ncol(); 
            }")

get_length(m)
## 50000 50000

object.size(m)
## 20000000200 bytes # about 20.0000002 gb

Matrix Bounds

In theory, you are bounded by the total number of elements in the matrix being less than (2^31 - 1)^2 = 4,611,686,014,132,420,609 per:

Arrays (including matrices) can be based on long vectors provided each of their dimensions is at most 2^31 - 1: thus there are no 1-dimensional long arrays.

See Long Vector

Now, fitting into a matrix:

m = matrix(nrow=2^31, ncol=1)

Error in matrix(nrow = 2^31, ncol = 1) : invalid 'nrow' value (too large or NA)

In addition: Warning message: In matrix(nrow = 2^31, ncol = 1) :

NAs introduced by coercion to integer range

The limit both R and Rcpp adhere to regarding the column/row is:

.Machine$integer.max
## 2147483647

Note that by 1 number we have:

2^31 = 2,147,483,648 > 2,147,483,647 = .Machine$integer.max

Maximum Amount of Elements in a Vector

However, the limit associated with a pure atomic vector is given as 2^52 (even though it should be in the ballpark of 2 ^ 64 - 1). Thus, we have the following example which illustrates the ability to access 2^32 by concatenating two vectors of 2^31 + 2^31:

v = numeric(2^31)
length(v)
## [1] 2147483648

object.size(v)
## 17179869224 bytes # about 17.179869224 gb

v2 = c(v,v)
length(v2)
## 4294967296

object.size(v2)
## 34359738408 bytes # about 34.359738408 gb

Suggestions

  1. Use bigmemory via Rcpp
  2. Maintain your own stack of vectors.


来源:https://stackoverflow.com/questions/38757326/is-there-a-limit-on-working-with-matrix-in-r-with-rcpp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!