问题
I know this question has been asked in the past (here and here, for example), but those questions are years old and unresolved. I am wondering if any solutions have been created since then. The issue is that the Matrix package in R cannot handle long vectors (length greater than 2^31 - 1). In my case, a sparse matrix is necessary for running an XGBoost model because of memory and time constraints. The XGBoost xgb.DMatrix
supports using a dgCMatrix
object. However, due to the size of my data, trying to create a sparse matrix results in an error. Here's an example of the issue. (Warning: this uses 50-60 GB RAM.)
i <- rep(1, 2^31)
j <- i
j[(2^30): length(j)] <- 2
x <- i
s <- sparseMatrix(i = i, j = j, x = x)
Error in validityMethod(as(object, superClass)) : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:137
As of 2019, are there any solutions to this issue?
I am using the latest version of the Matrix
package, 1.2-15.
回答1:
The sparse matrix algebra R package spam with its spam64 extension supports sparse matrices with more than 2^31-1 non-zero elements.
A simple example (requires ~50 Gb memory and takes ~5 mins to run):
## -- a regular 32-bit spam matrix
library(spam) # version 2.2-2
s <- spam(1:2^30)
summary(s)
## Matrix object of class 'spam' of dimension 1073741824x1,
## with 1073741824 (row-wise) nonzero elements.
## Density of the matrix is 100%.
## Class 'spam'
## -- a 64-bit spam matrix with 2^31 non-zero entries
library(spam64)
s <- cbind(s, s)
summary(s)
## Matrix object of class 'spam' of dimension 1073741824x2,
## with 2147483648 (row-wise) nonzero elements.
## Density of the matrix is 100%.
## Class 'spam'
## -- add zeros to make the dimension 2^31 x 2^31
pad(s) <- c(2^31, 2^31)
summary(s)
## Matrix object of class 'spam' of dimension 2147483648x2147483648,
## with 2147483648 (row-wise) nonzero elements.
## Density of the matrix is 4.66e-08%.
## Class 'spam'
Some links:
- https://cran.r-project.org/package=spam
- https://cran.r-project.org/package=spam64
- https://cran.r-project.org/package=dotCall64
- https://doi.org/10.1016/j.cageo.2016.11.015
- https://doi.org/10.1016/j.softx.2018.06.002
I am one of the authors of dotCall64 and spam.
来源:https://stackoverflow.com/questions/54405435/sparse-matrix-support-for-long-vectors-over-231-elements