Rcpp: my distance matrix program is slower than the function in package

我的未来我决定 提交于 2019-11-29 15:59:34

Rcpp vs. Internal R Functions (C/Fortran)

First of all, just because you are writing the algorithm using Rcpp does not necessarily mean it will beat out the R equivalent, especially if the R function calls a C or Fortran routine to perform the bulk of the computations. In other cases where the function is written purely in R, there is a high probability that transforming it in Rcpp will yield the desired speed gain.

Remember, when rewriting internal functions, one is going up against the R Core team of absolutely insane C programmers most likely will win out.

Base Implementation of dist()

Secondly, the distance calculation R uses is done in C as indicated by:

.Call(C_Cdist, x, method, attrs, p)

, which is the last line of the dist() function's R source. This gives it a slight advantage vs. C++ as it more granular instead of templated.

Furthermore, the C implementation uses OpenMP when available to parallelize the computation.

Proposed modification

Thirdly, by changing the subset order slightly and avoiding creating an additional variable, the timings between versions decrease.

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::NumericMatrix calcPWD1 (const Rcpp::NumericMatrix & x){
  unsigned int outrows = x.nrow(), i = 0, j = 0;
  double d;
  Rcpp::NumericMatrix out(outrows,outrows);

  for (i = 0; i < outrows - 1; i++){
    Rcpp::NumericVector v1 = x.row(i);
    for (j = i + 1; j < outrows ; j ++){
      d = sqrt(sum(pow(v1-x.row(j), 2.0)));
      out(j,i)=d;
      out(i,j)=d;
    }
  }

  return out;
}

You were almost there. But your inner loop body tried to do too much in one line. Template programming is hard enough as it is, and sometimes it is just better to spread instructions out a little to give the compiler a better chance. So I just made it five statements, and built immediatelt.

New code:

#include <Rcpp.h>

using namespace Rcpp;

double dist1 (NumericVector x, NumericVector y){
  int n = y.length();
  double total = 0;
  for (int i = 0; i < n ; ++i) {
    total += pow(x(i)-y(i),2.0);
  }
  total = sqrt(total);
  return total;
}

// [[Rcpp::export]]
NumericMatrix calcPWD (NumericMatrix x){
  int outrows = x.nrow();
  int outcols = x.nrow();
  NumericMatrix out(outrows,outcols);

  for (int i = 0 ; i < outrows - 1; i++){
    for (int j = i + 1  ; j < outcols ; j ++) {
      NumericVector v1 = x.row(i);
      NumericVector v2 = x.row(j-1);
      double d = dist1(v1, v2);
      out(j-1,i) = d;
      out(i,j-1)= d;
    }
  }
  return (out) ;
}

/*** R
M <- matrix(log(1:9), 3, 3)
calcPWD(M)
*/

Running it:

R> sourceCpp("/tmp/mikebrown.cpp")

R> M <- matrix(log(1:9), 3, 3)

R> calcPWD(M)
         [,1]     [,2] [,3]
[1,] 0.000000 0.740322    0
[2,] 0.740322 0.000000    0
[3,] 0.000000 0.000000    0
R> 

You may want to check your indexing logic though. Looks like you missed more comparisons.

Edit: For kicks, here is a more compact version of your distance function:

// [[Rcpp::export]]
double dist2(NumericVector x, NumericVector y){
  double d = sqrt( sum( pow(x - y, 2) ) );
  return d;
}
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!