Sparse matrix support for long vectors (over 2^31 elements)

ε祈祈猫儿з 提交于 2020-05-29 07:37:26

问题


I know this question has been asked in the past (here and here, for example), but those questions are years old and unresolved. I am wondering if any solutions have been created since then. The issue is that the Matrix package in R cannot handle long vectors (length greater than 2^31 - 1). In my case, a sparse matrix is necessary for running an XGBoost model because of memory and time constraints. The XGBoost xgb.DMatrix supports using a dgCMatrix object. However, due to the size of my data, trying to create a sparse matrix results in an error. Here's an example of the issue. (Warning: this uses 50-60 GB RAM.)

i <- rep(1, 2^31)
j <- i
j[(2^30): length(j)] <- 2
x <- i
s <- sparseMatrix(i = i, j = j, x = x)

Error in validityMethod(as(object, superClass)) : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:137

As of 2019, are there any solutions to this issue?

I am using the latest version of the Matrix package, 1.2-15.


回答1:


The sparse matrix algebra R package spam with its spam64 extension supports sparse matrices with more than 2^31-1 non-zero elements.

A simple example (requires ~50 Gb memory and takes ~5 mins to run):

## -- a regular 32-bit spam matrix
library(spam) # version 2.2-2
s <- spam(1:2^30)
summary(s) 
## Matrix object of class 'spam' of dimension 1073741824x1,
##     with 1073741824 (row-wise) nonzero elements.
##     Density of the matrix is 100%.
## Class 'spam'

## -- a 64-bit spam matrix with 2^31 non-zero entries
library(spam64)
s <- cbind(s, s) 
summary(s) 
## Matrix object of class 'spam' of dimension 1073741824x2,
##     with 2147483648 (row-wise) nonzero elements.
##     Density of the matrix is 100%.
## Class 'spam'

## -- add zeros to make the dimension 2^31 x 2^31
pad(s) <- c(2^31, 2^31) 
summary(s) 
## Matrix object of class 'spam' of dimension 2147483648x2147483648,
##     with 2147483648 (row-wise) nonzero elements.
##     Density of the matrix is 4.66e-08%.
## Class 'spam'

Some links:

  • https://cran.r-project.org/package=spam
  • https://cran.r-project.org/package=spam64
  • https://cran.r-project.org/package=dotCall64
  • https://doi.org/10.1016/j.cageo.2016.11.015
  • https://doi.org/10.1016/j.softx.2018.06.002

I am one of the authors of dotCall64 and spam.



来源:https://stackoverflow.com/questions/54405435/sparse-matrix-support-for-long-vectors-over-231-elements

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!