How to optimize Read and Write to subsections of a matrix in R (possibly using data.table)

后端未结

关注

 2  669

TL;DR

What is the fastest method in R for reading and writing a subset of columns from a very large matrix. I attempt a solution with data

相关标签:

2条回答

逝去的感伤

2020-12-13 17:50

Here's what I had in mind. This could probably be much sexier with Rcpp and friends, but I'm not as familiar with those tools.

#include <R.h> #include <Rinternals.h> #include <Rdefines.h> SEXP addCol(SEXP mat, SEXP loc, SEXP matAdd) { int i, nr = nrows(mat), nc = ncols(matAdd), ll = length(loc); if(ll != nc) error("length(loc) must equal ncol(matAdd)"); if(TYPEOF(mat) != TYPEOF(matAdd)) error("mat and matAdd must be the same type"); if(nr != nrows(matAdd)) error("mat and matAdd must have the same number of rows"); if(TYPEOF(loc) != INTSXP) error("loc must be integer"); int *iloc = INTEGER(loc); switch(TYPEOF(mat)) { case REALSXP: for(i=0; i < ll; i++) memcpy(&(REAL(mat)[(iloc[i]-1)*nr]), &(REAL(matAdd)[i*nr]), nr*sizeof(double)); break; case INTSXP: for(i=0; i < ll; i++) memcpy(&(INTEGER(mat)[(iloc[i]-1)*nr]), &(INTEGER(matAdd)[i*nr]), nr*sizeof(int)); break; default: error("unsupported type"); } return R_NilValue; }

Put the above function in addCol.c, then run R CMD SHLIB addCol.c. Then in R:

addColC <- dyn.load("addCol.so")$addCol .Call(addColC, mat, Vsub, mat[,Vsub]+toinsert)

The slight advantage to this approach over Roland's is that this only does the assignment. His function does the addition for you, which is faster, but also means you need a separate C/C++ function for every operation you need to do.

0 讨论(0)

发布评论:

提交评论

加载中...

情书的邮戳

2020-12-13 17:55

Fun with Rcpp:

You can use Eigen's Map class to modify an R object in place.

library(RcppEigen) library(inline) incl <- ' using Eigen::Map; using Eigen::MatrixXd; using Eigen::VectorXi; typedef Map<MatrixXd> MapMatd; typedef Map<VectorXi> MapVeci; ' body <- ' MapMatd A(as<MapMatd>(AA)); const MapMatd B(as<MapMatd>(BB)); const MapVeci ix(as<MapVeci>(ind)); const int mB(B.cols()); for (int i = 0; i < mB; ++i) { A.col(ix.coeff(i)-1) += B.col(i); } ' funRcpp <- cxxfunction(signature(AA = "matrix", BB ="matrix", ind = "integer"), body, "RcppEigen", incl) set.seed(94253) K <- 100 V <- 100000 mat2 <- mat <- matrix(runif(K*V),nrow=K,ncol=V) Vsub <- sample(1:V, 20) toinsert <- matrix(runif(K*length(Vsub)), nrow=K, ncol=length(Vsub)) mat[,Vsub] <- mat[,Vsub] + toinsert invisible(funRcpp(mat2, toinsert, Vsub)) all.equal(mat, mat2) #[1] TRUE library(microbenchmark) microbenchmark(mat[,Vsub] <- mat[,Vsub] + toinsert, funRcpp(mat2, toinsert, Vsub)) # Unit: microseconds # expr min lq median uq max neval # mat[, Vsub] <- mat[, Vsub] + toinsert 49.273 49.628 50.3250 50.8075 20020.400 100 # funRcpp(mat2, toinsert, Vsub) 6.450 6.805 7.6605 7.9215 25.914 100

I think this is basically what @Joshua Ulrich proposed. His warnings regarding breaking R's functional paradigm apply.

I do the addition in C++, but it is trivial to change the function to only do assignment.

Obviously, if you can implement your whole loop in Rcpp, you avoid repeated function calls at the R level and will gain performance.

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复