Rcpp: my distance matrix program is slower than the function in package

前端 未结 2 1321
孤独总比滥情好
孤独总比滥情好 2020-12-21 17:33

I would like to calculate the pairwise euclidean distance matrix. I wrote Rcpp programs by the suggestion of Dirk Eddelbuettel as follows

Nu         


        
2条回答
  •  情书的邮戳
    2020-12-21 18:26

    Rcpp vs. Internal R Functions (C/Fortran)

    First of all, just because you are writing the algorithm using Rcpp does not necessarily mean it will beat out the R equivalent, especially if the R function calls a C or Fortran routine to perform the bulk of the computations. In other cases where the function is written purely in R, there is a high probability that transforming it in Rcpp will yield the desired speed gain.

    Remember, when rewriting internal functions, one is going up against the R Core team of absolutely insane C programmers most likely will win out.

    Base Implementation of dist()

    Secondly, the distance calculation R uses is done in C as indicated by:

    .Call(C_Cdist, x, method, attrs, p)
    

    , which is the last line of the dist() function's R source. This gives it a slight advantage vs. C++ as it more granular instead of templated.

    Furthermore, the C implementation uses OpenMP when available to parallelize the computation.

    Proposed modification

    Thirdly, by changing the subset order slightly and avoiding creating an additional variable, the timings between versions decrease.

    #include 
    
    // [[Rcpp::export]]
    Rcpp::NumericMatrix calcPWD1 (const Rcpp::NumericMatrix & x){
      unsigned int outrows = x.nrow(), i = 0, j = 0;
      double d;
      Rcpp::NumericMatrix out(outrows,outrows);
    
      for (i = 0; i < outrows - 1; i++){
        Rcpp::NumericVector v1 = x.row(i);
        for (j = i + 1; j < outrows ; j ++){
          d = sqrt(sum(pow(v1-x.row(j), 2.0)));
          out(j,i)=d;
          out(i,j)=d;
        }
      }
    
      return out;
    }
    

提交回复
热议问题