Is there a limit on working with matrix in R with Rcpp?

非 Y 不嫁゛ 提交于 2019-12-02 03:44:53

To repeat more succintly:

  1. You can have more than 2^31-1 elements in a vector.

  2. Matrices are vectors with dim attributes.

  3. You can have more than 2^31-1 elements in a matrix (ie n times k)

  4. Your row and column index are still limited to 2^31.

Example of a big vector:

R> n <- .Machine$integer.max + 100
R> tmpVec <- 1:n
R> length(tmpVec)
[1] 2147483747
R> newVec <- sqrt(tmpVec)
R> 

A couple caveats

Before we get started, I'm assuming:

Constructing a large NumericMatrix

I think you may be running into a memory allocation issue...

As the following works on my 32gb machine:

Rcpp::cppFunction("NumericMatrix make_matrix(){
                   NumericMatrix m(50000, 50000);
                   return m;
                  }")

m = make_matrix()

object.size(m)

## 20000000200 bytes # about 20.0000002 gb

Running:

# Creates an 18.6gb matrix!!!
m = matrix(0, ncol = 50000, nrow = 50000)

Rcpp::cppFunction("void get_length(NumericMatrix m){
                   Rcout << m.nrow() << ' ' << m.ncol(); 
            }")

get_length(m)
## 50000 50000

object.size(m)
## 20000000200 bytes # about 20.0000002 gb

Matrix Bounds

In theory, you are bounded by the total number of elements in the matrix being less than (2^31 - 1)^2 = 4,611,686,014,132,420,609 per:

Arrays (including matrices) can be based on long vectors provided each of their dimensions is at most 2^31 - 1: thus there are no 1-dimensional long arrays.

See Long Vector

Now, fitting into a matrix:

m = matrix(nrow=2^31, ncol=1)

Error in matrix(nrow = 2^31, ncol = 1) : invalid 'nrow' value (too large or NA)

In addition: Warning message: In matrix(nrow = 2^31, ncol = 1) :

NAs introduced by coercion to integer range

The limit both R and Rcpp adhere to regarding the column/row is:

.Machine$integer.max
## 2147483647

Note that by 1 number we have:

2^31 = 2,147,483,648 > 2,147,483,647 = .Machine$integer.max

Maximum Amount of Elements in a Vector

However, the limit associated with a pure atomic vector is given as 2^52 (even though it should be in the ballpark of 2 ^ 64 - 1). Thus, we have the following example which illustrates the ability to access 2^32 by concatenating two vectors of 2^31 + 2^31:

v = numeric(2^31)
length(v)
## [1] 2147483648

object.size(v)
## 17179869224 bytes # about 17.179869224 gb

v2 = c(v,v)
length(v2)
## 4294967296

object.size(v2)
## 34359738408 bytes # about 34.359738408 gb

Suggestions

  1. Use bigmemory via Rcpp
  2. Maintain your own stack of vectors.
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!