I would like find the fastes way in R to indentify indexes of elements in Ytimes array which are closest to given Xtimes values.
So far I have been using a simple for-lo
Obligatory Rcpp solution. Takes advantage of the fact that your vectors are sorted and don't contain duplicates to turn an O(n^2) into an O(n). May or may not be practical for your application ;)
C++:
#include
#include
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector closest_pts(NumericVector Xtimes, NumericVector Ytimes) {
int xsize = Xtimes.size();
int ysize = Ytimes.size();
int y_ind = 0;
double minval = R_PosInf;
IntegerVector output(xsize);
for(int x_ind = 0; x_ind < xsize; x_ind++) {
while(std::abs(Ytimes[y_ind] - Xtimes[x_ind]) < minval) {
minval = std::abs(Ytimes[y_ind] - Xtimes[x_ind]);
y_ind++;
}
output[x_ind] = y_ind;
minval = R_PosInf;
}
return output;
}
R:
microbenchmark::microbenchmark(
for_loop = {
for (i in 1:length(Xtimes)) {
which.min(abs(Ytimes - Xtimes[i]))
}
},
apply = sapply(Xtimes, function(x){which.min(abs(Ytimes - x))}),
fndIntvl = {
Y2 <- c(-Inf, Ytimes + c(diff(Ytimes)/2, Inf))
Ytimes[ findInterval(Xtimes, Y2) ]
},
rcpp = closest_pts(Xtimes, Ytimes),
times = 100
)
Unit: microseconds
expr min lq mean median uq max neval cld
for_loop 3321.840 3422.51 3584.452 3492.308 3624.748 10458.52 100 b
apply 68.365 73.04 106.909 84.406 93.097 2345.26 100 a
fndIntvl 31.623 37.09 50.168 42.019 64.595 105.14 100 a
rcpp 2.431 3.37 5.647 4.301 8.259 10.76 100 a
identical(closest_pts(Xtimes, Ytimes), findInterval(Xtimes, Y2))
# TRUE