How to optimize Read and Write to subsections of a matrix in R (possibly using data.table)

后端 未结 2 675
耶瑟儿~
耶瑟儿~ 2020-12-13 17:27

TL;DR

What is the fastest method in R for reading and writing a subset of columns from a very large matrix. I attempt a solution with data

2条回答
  •  情书的邮戳
    2020-12-13 17:55

    Fun with Rcpp:

    You can use Eigen's Map class to modify an R object in place.

    library(RcppEigen)
    library(inline)
    
    incl <- '
    using  Eigen::Map;
    using  Eigen::MatrixXd;
    using  Eigen::VectorXi;
    typedef  Map  MapMatd;
    typedef  Map  MapVeci;
    '
    
    body <- '
    MapMatd              A(as(AA));
    const MapMatd        B(as(BB));
    const MapVeci        ix(as(ind));
    const int            mB(B.cols());
    for (int i = 0; i < mB; ++i) 
    {
    A.col(ix.coeff(i)-1) += B.col(i);
    }
    '
    
    funRcpp <- cxxfunction(signature(AA = "matrix", BB ="matrix", ind = "integer"), 
                           body, "RcppEigen", incl)
    
    set.seed(94253)
    K <- 100
    V <- 100000
    mat2 <-  mat <-  matrix(runif(K*V),nrow=K,ncol=V)
    
    Vsub <- sample(1:V, 20)
    toinsert <- matrix(runif(K*length(Vsub)), nrow=K, ncol=length(Vsub))
    mat[,Vsub] <- mat[,Vsub] + toinsert
    
    invisible(funRcpp(mat2, toinsert, Vsub))
    all.equal(mat, mat2)
    #[1] TRUE
    
    library(microbenchmark)
    microbenchmark(mat[,Vsub] <- mat[,Vsub] + toinsert,
                   funRcpp(mat2, toinsert, Vsub))
    # Unit: microseconds
    #                                  expr    min     lq  median      uq       max neval
    # mat[, Vsub] <- mat[, Vsub] + toinsert 49.273 49.628 50.3250 50.8075 20020.400   100
    #         funRcpp(mat2, toinsert, Vsub)  6.450  6.805  7.6605  7.9215    25.914   100
    

    I think this is basically what @Joshua Ulrich proposed. His warnings regarding breaking R's functional paradigm apply.

    I do the addition in C++, but it is trivial to change the function to only do assignment.

    Obviously, if you can implement your whole loop in Rcpp, you avoid repeated function calls at the R level and will gain performance.

提交回复
热议问题