Find nearest points of latitude and longitude from different data sets with different length

前端 未结 5 2174
你的背包
你的背包 2020-12-19 08:24

I have two data set of different stations. The data are basically data.frames with coordinates, longitudes and latitudes. Given the first data set (or vice versa), I want to

相关标签:
5条回答
  • 2020-12-19 08:59

    The function s2_closest_feature() from the s2 package finds nearest points from different data sets.

    For example, with your data:

    library(s2)
    set1_s2 <- s2_lnglat(set1$lon, set1$lat)
    set2_s2 <- s2_lnglat(set2$lon, set2$lat)
    set1$closest <- s2_closest_feature(set1_s2, set2_s2)
    set1
    #>         lon      lat closest
    #> 1  13.67111 48.39167      10
    #> 2  12.86695 48.14806      10
    #> 3  15.94223 48.72111      10
    #> 4  11.09974 47.18917       1
    #> 5  12.95834 47.05444       1
    #> 6  14.20389 47.12917       1
    #> 7  11.86389 47.30667       1
    #> 8  16.52667 47.84000       1
    #> 9  16.19306 47.30417       1
    #> 10 17.07139 48.10944       1
    
    0 讨论(0)
  • 2020-12-19 09:03

    I don't exactly know what you want, but maybe this gives you some hints
    if you want to get the min value for each column

      dd <- as.data.frame(dd)
      sapply(dd, min)
      paste(rownames(dd), ":", apply(dd,2,which.min)) #or
    
    0 讨论(0)
  • 2020-12-19 09:05

    Here is an other possible solution:

    library(rgeos)
    set1sp <- SpatialPoints(set1)
    set2sp <- SpatialPoints(set2)
    set1$nearest_in_set2 <- apply(gDistance(set1sp, set2sp, byid=TRUE), 1, which.min)
    
    head(set1)
           lon      lat nearest_in_set2
    ## 1 13.67111 48.39167              10
    ## 2 12.86695 48.14806              10
    ## 3 15.94223 48.72111              10
    ## 4 11.09974 47.18917               1
    ## 5 12.95834 47.05444               1
    ## 6 14.20389 47.12917               1
    
    0 讨论(0)
  • 2020-12-19 09:09

    You can use a series of apply commands to do this. Note that the x and y in the functions refer to set1 and set2 rather than the lat lon coords - the lat lon coords are specified as p1 and p2. [NOTE: Edited to correct order of set1 and set2 in calculations - the order determines if you are calculating the value in set2 closest to each value in set 1 or vice-versa)

    distp1p2 <- function(p1,p2) {
        dst <- sqrt((p1[1]-p2[1])^2+(p1[2]-p2[2])^2)
        return(dst)
    }
    
    dist2 <- function(y) min(apply(set2, 1, function(x) min(distp1p2(x,y))))
    
    apply(set1, 1, dist2)
    

    Or if you want the station with the nearest point rather than the min distance change min to which.min in dist2()

    dist2b <- function(y) which.min(apply(set2, 1, function(x) min(distp1p2(x,y))))
    apply(set1, 1, dist2b)
    

    And to get the lat-lon for that station

    set2[apply(set1, 1, dist2b),]
    
    0 讨论(0)
  • 2020-12-19 09:20

    If you have extremely large datasets, using a distance command can be cumbersome as it must calculate the distance to all points in the alternative data for each point in the reference data. The 'ann' command from the 'yaImpute' package is a very fast approximate nearest-neighbour routine that is good for large distance calculations. It will return however many "closest" records you want (the value of k) as well as the distance to each of them.

    Note: despite being an approximate nearest neighbour, the results are stable on repeated runs of the same data. It doesn't include a random selection of points or anything. See documentation.

    FWIW, I'm really not kidding about fast. I've used this to find knn distances for two matrices, each with millions of points. Making a distance matrix for this or doing it iteratively row-by-row is either unfeasible or painfully slow.

    Quick example:

    # Hypothetical coordinate data
    set.seed(2187); foo1 <- round(abs(data.frame(x=runif(5), y=runif(5))*100))
    set.seed(2187); foo2 <- round(abs(data.frame(x=runif(10), y=runif(10))*100))
    foo1; foo2
    
    # the 'ann' command from the 'yaImpute' package
    install.packages("yaImpute")
    library(yaImpute)
    
    # Approximate nearest-neighbour search, reporting 2 nearest points (k=2)
    # This command finds the 3 nearest points in foo2 for each point in foo1
    # In the output:
    #   The first k columns are the row numbers of the points
    #   The next k columns (k+1:2k) are the *squared* euclidean distances
    knn.out <- ann(as.matrix(foo2), as.matrix(foo1), k=3)
    knn.out$knnIndexDist
    
         [,1] [,2] [,3] [,4] [,5] [,6]
    [1,]    1    5    4  729 1658 2213
    [2,]    2    3    7   16  100 1025
    [3,]    9    7    5   40   81  740
    [4,]    4    1    6   16  580  673
    [5,]    5    7    9    0  677  980
    

    https://cran.r-project.org/web/packages/yaImpute/index.html

    0 讨论(0)
提交回复
热议问题