I was trying to develop a program in R to estimate a Spearman correlation with Rcpp. I did it, but it only works with matrix with less of a range between 45 00 - 50 000 vectors. I don't know why, but it only works with that dimension. I suppose there's limit with that type of information, maybe if I work it like a data.frame? I would really appreciate if someone gives me insight.
Here i post my code. Ive been trying to limit the max integer number that i call "denominador", which exceeds it. Maybe you could help me.
cppFunction('double spearman(NumericMatrix x){
int nrow = x.nrow(), ncol = x.ncol();
int nrow1 = nrow - 1;
double out = 0;
double cont = 0;
double cont1 = 0;
double r = 0;
int denominador = ncol*(pow(ncol,2.0)-1)
for(int i = 0; i < nrow1; i++){
#Here i use every combination of vectors starting with the first one, and so on
for(int j = i +1; j < nrow; j++){
cont1 = 0;
for(int t = 0; t < ncol; t++){
cont = pow(x(i,t)-x(j,t), 2.0);
cont1 += cont;
}
#Here i begin to store the mean correlation, in order to a final mean of all the possible correlations
r = 2*(1-6*(cont1/denominador))/(nrow*nrow1);
out += r;
}
}
return out;
}')
To repeat more succintly:
You can have more than 2^31-1 elements in a vector.
Matrices are vectors with
dim
attributes.You can have more than 2^31-1 elements in a matrix (ie
n
timesk
)Your row and column index are still limited to 2^31.
Example of a big vector:
R> n <- .Machine$integer.max + 100
R> tmpVec <- 1:n
R> length(tmpVec)
[1] 2147483747
R> newVec <- sqrt(tmpVec)
R>
A couple caveats
Before we get started, I'm assuming:
R > 3.0.0
- Long Vectors that allow for 2 ^ 52 elements are then supported
Rcpp > 0.12.0
- Patch where thirdwing replaced instances of
int
andsize_t
withR_xlen_t
andR_xlength
. See release post for more details...
- Patch where thirdwing replaced instances of
Constructing a large NumericMatrix
I think you may be running into a memory allocation issue...
As the following works on my 32gb machine:
Rcpp::cppFunction("NumericMatrix make_matrix(){
NumericMatrix m(50000, 50000);
return m;
}")
m = make_matrix()
object.size(m)
## 20000000200 bytes # about 20.0000002 gb
Running:
# Creates an 18.6gb matrix!!!
m = matrix(0, ncol = 50000, nrow = 50000)
Rcpp::cppFunction("void get_length(NumericMatrix m){
Rcout << m.nrow() << ' ' << m.ncol();
}")
get_length(m)
## 50000 50000
object.size(m)
## 20000000200 bytes # about 20.0000002 gb
Matrix Bounds
In theory, you are bounded by the total number of elements in the matrix being less than (2^31 - 1)^2 = 4,611,686,014,132,420,609 per:
Arrays (including matrices) can be based on long vectors provided each of their dimensions is at most 2^31 - 1: thus there are no 1-dimensional long arrays.
See Long Vector
Now, fitting into a matrix:
m = matrix(nrow=2^31, ncol=1)
Error in matrix(nrow = 2^31, ncol = 1) : invalid 'nrow' value (too large or NA)
In addition: Warning message: In matrix(nrow = 2^31, ncol = 1) :
NAs introduced by coercion to integer range
The limit both R and Rcpp adhere to regarding the column/row is:
.Machine$integer.max
## 2147483647
Note that by 1 number we have:
2^31 = 2,147,483,648 > 2,147,483,647 = .Machine$integer.max
Maximum Amount of Elements in a Vector
However, the limit associated with a pure atomic vector is given as 2^52 (even though it should be in the ballpark of 2 ^ 64 - 1). Thus, we have the following example which illustrates the ability to access 2^32 by concatenating two vectors of 2^31 + 2^31:
v = numeric(2^31)
length(v)
## [1] 2147483648
object.size(v)
## 17179869224 bytes # about 17.179869224 gb
v2 = c(v,v)
length(v2)
## 4294967296
object.size(v2)
## 34359738408 bytes # about 34.359738408 gb
Suggestions
- Use
bigmemory
viaRcpp
- Maintain your own stack of vectors.
来源:https://stackoverflow.com/questions/38757326/is-there-a-limit-on-working-with-matrix-in-r-with-rcpp