I have two sets of points, called path and centers. For each point in path, I would like an efficient method for finding the ID of the
Here is an Rcpp11 solution. Something similar might work with Rcpp with a few changes.
#define RCPP11_PARALLEL_MINIMUM_SIZE 1000
#include
inline double square(double x){
return x*x ;
}
// [[Rcpp::export]]
IntegerVector closest( DataFrame path, DataFrame centers ){
NumericVector path_x = path["x"], path_y = path["y"] ;
NumericVector centers_x = centers["x"], centers_y = centers["y"] ;
int n_paths = path_x.size(), n_centers = centers_x.size() ;
IntegerVector ids = sapply( seq_len(n_paths), [&](int i){
double px = path_x[i], py=path_y[i] ;
auto get_distance = [&](int j){
return square(px - centers_x[j]) + square(py-centers_y[j]) ;
} ;
double distance = get_distance(0) ;
int res=0;
for( int j=1; j
I get :
> set.seed(1)
> n <- 10000
> x <- 100 * cumprod(1 + rnorm(n, 1e-04, 0.002))
> y <- 50 * cumprod(1 + rnorm(n, 1e-04, 0.002))
> path <- data.frame(cbind(x = x, y = y))
> centers <- expand.grid(x = seq(0, 500, by = 0.5) +
+ rnorm(1001), y = seq(0, 500, by = 0.2) + rnorm(2501))
> system.time(closest(path, centers))
user system elapsed
84.740 0.141 21.392
This takes advantage of automatic parallelization of sugar, i.e. sapply is run in parallel. The #define RCPP11_PARALLEL_MINIMUM_SIZE 1000 part is to force the parallel, which is otherwise by default only kicked in from 10000. But in that case since the inner computation are time consuming, it's worth it.
Note that you need a development version of Rcpp11 because unique is broken in the released version.