I have two data set of different stations. The data are basically data.frames with coordinates, longitudes and latitudes. Given the first data set (or vice versa), I want to
If you have extremely large datasets, using a distance command can be cumbersome as it must calculate the distance to all points in the alternative data for each point in the reference data. The 'ann' command from the 'yaImpute' package is a very fast approximate nearest-neighbour routine that is good for large distance calculations. It will return however many "closest" records you want (the value of k) as well as the distance to each of them.
Note: despite being an approximate nearest neighbour, the results are stable on repeated runs of the same data. It doesn't include a random selection of points or anything. See documentation.
FWIW, I'm really not kidding about fast. I've used this to find knn distances for two matrices, each with millions of points. Making a distance matrix for this or doing it iteratively row-by-row is either unfeasible or painfully slow.
Quick example:
# Hypothetical coordinate data
set.seed(2187); foo1 <- round(abs(data.frame(x=runif(5), y=runif(5))*100))
set.seed(2187); foo2 <- round(abs(data.frame(x=runif(10), y=runif(10))*100))
foo1; foo2
# the 'ann' command from the 'yaImpute' package
install.packages("yaImpute")
library(yaImpute)
# Approximate nearest-neighbour search, reporting 2 nearest points (k=2)
# This command finds the 3 nearest points in foo2 for each point in foo1
# In the output:
# The first k columns are the row numbers of the points
# The next k columns (k+1:2k) are the *squared* euclidean distances
knn.out <- ann(as.matrix(foo2), as.matrix(foo1), k=3)
knn.out$knnIndexDist
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 5 4 729 1658 2213
[2,] 2 3 7 16 100 1025
[3,] 9 7 5 40 81 740
[4,] 4 1 6 16 580 673
[5,] 5 7 9 0 677 980
https://cran.r-project.org/web/packages/yaImpute/index.html