What is the fastest method in R for reading and writing a subset of columns from a very large matrix. I attempt a solution with data
Fun with Rcpp:
You can use Eigen's Map class to modify an R object in place.
library(RcppEigen)
library(inline)
incl <- '
using Eigen::Map;
using Eigen::MatrixXd;
using Eigen::VectorXi;
typedef Map MapMatd;
typedef Map MapVeci;
'
body <- '
MapMatd A(as(AA));
const MapMatd B(as(BB));
const MapVeci ix(as(ind));
const int mB(B.cols());
for (int i = 0; i < mB; ++i)
{
A.col(ix.coeff(i)-1) += B.col(i);
}
'
funRcpp <- cxxfunction(signature(AA = "matrix", BB ="matrix", ind = "integer"),
body, "RcppEigen", incl)
set.seed(94253)
K <- 100
V <- 100000
mat2 <- mat <- matrix(runif(K*V),nrow=K,ncol=V)
Vsub <- sample(1:V, 20)
toinsert <- matrix(runif(K*length(Vsub)), nrow=K, ncol=length(Vsub))
mat[,Vsub] <- mat[,Vsub] + toinsert
invisible(funRcpp(mat2, toinsert, Vsub))
all.equal(mat, mat2)
#[1] TRUE
library(microbenchmark)
microbenchmark(mat[,Vsub] <- mat[,Vsub] + toinsert,
funRcpp(mat2, toinsert, Vsub))
# Unit: microseconds
# expr min lq median uq max neval
# mat[, Vsub] <- mat[, Vsub] + toinsert 49.273 49.628 50.3250 50.8075 20020.400 100
# funRcpp(mat2, toinsert, Vsub) 6.450 6.805 7.6605 7.9215 25.914 100
I think this is basically what @Joshua Ulrich proposed. His warnings regarding breaking R's functional paradigm apply.
I do the addition in C++, but it is trivial to change the function to only do assignment.
Obviously, if you can implement your whole loop in Rcpp, you avoid repeated function calls at the R level and will gain performance.