Quickest way to find closest elements in an array in R

前端 未结 3 1601
青春惊慌失措
青春惊慌失措 2021-01-29 00:21

I would like find the fastes way in R to indentify indexes of elements in Ytimes array which are closest to given Xtimes values.

So far I have been using a simple for-lo

3条回答
  •  青春惊慌失措
    2021-01-29 01:07

    Obligatory Rcpp solution. Takes advantage of the fact that your vectors are sorted and don't contain duplicates to turn an O(n^2) into an O(n). May or may not be practical for your application ;)

    C++:

    #include 
    #include 
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    IntegerVector closest_pts(NumericVector Xtimes, NumericVector Ytimes) {
      int xsize = Xtimes.size();
      int ysize = Ytimes.size();
      int y_ind = 0;
      double minval = R_PosInf;
      IntegerVector output(xsize);
      for(int x_ind = 0; x_ind < xsize; x_ind++) {
        while(std::abs(Ytimes[y_ind] - Xtimes[x_ind]) < minval) {
          minval = std::abs(Ytimes[y_ind] - Xtimes[x_ind]);
          y_ind++;
        }
        output[x_ind] = y_ind;
        minval = R_PosInf;
      }
      return output;
    }
    

    R:

    microbenchmark::microbenchmark(
      for_loop = {
        for (i in 1:length(Xtimes)) {
          which.min(abs(Ytimes - Xtimes[i]))
        }
      },
      apply    = sapply(Xtimes, function(x){which.min(abs(Ytimes - x))}),
      fndIntvl = {
        Y2 <- c(-Inf, Ytimes + c(diff(Ytimes)/2, Inf))
        Ytimes[ findInterval(Xtimes, Y2) ]
      },
      rcpp = closest_pts(Xtimes, Ytimes),
      times = 100
    )
    
    Unit: microseconds
         expr      min      lq     mean   median       uq      max neval cld
     for_loop 3321.840 3422.51 3584.452 3492.308 3624.748 10458.52   100   b
        apply   68.365   73.04  106.909   84.406   93.097  2345.26   100  a 
     fndIntvl   31.623   37.09   50.168   42.019   64.595   105.14   100  a 
         rcpp    2.431    3.37    5.647    4.301    8.259    10.76   100  a 
    
    identical(closest_pts(Xtimes, Ytimes), findInterval(Xtimes, Y2))
    # TRUE
    

提交回复
热议问题