How can I calculate distance between multiple latitude and longitude data?

强颜欢笑 提交于 2021-02-05 11:28:46

问题


I have 1100 station location (latitude and longitude) data and 10000 house location (latitude and longitude) data. Is it possible to calculate the lowest distance between station and house for each house by using R codes? I also want the station that gives the lowest distance for each house. Is it possible?


回答1:


Here's a toy example for finding mass distances between m points and n cities. It should translate directly to your station/house problem.

I brought up worldcities, spun the globe (so to speak), and stopped on four cities. I then spun again and stopped at two points. The two counts here are immaterial: if we have 4 and 2 or 1100 and 10000, it should not matter much.

worldcities <- read.csv(header = TRUE, stringsAsFactors = FALSE, text = "
lat,lon
39.7642548,-104.9951942
48.8588377,2.2770206
26.9840891,49.4080842
13.7245601,100.493026")

coords <- read.csv(header = TRUE, stringsAsFactors = FALSE, text = "
lat,lon
27.9519571,66.8681431
40.5351151,-108.4939948")

(A quick note ... often, tools give us coordinates in "latitude, longitude", at least in my experience. geosphere functions, however, assumes "longitude, latitude". So my coordinates above were copied straight from random views in google maps, and I didn't want to edit them; because of this, I reverse the columns below with [,2:1] column indexing. If you forget and give coordinates that are undeniably not correct, you'll get the error Error in .pointsToMatrix(p1) : latitude < -90, which should be a prod that you have likely reversed the order of your coordinates. At which point you scratch your head and wonder if all of your other projects have used the wrong coordinates, calling into question your conclusions. Not me, I've never been there. This year.)

Let's find the distance in meters between each of coords (each row) and each city (each column):

dists <- outer(seq_len(nrow(coords)), seq_len(nrow(worldcities)),
               function(i, j) geosphere::distHaversine(coords[i,2:1], worldcities[j,2:1]))
dists
#            [,1]    [,2]     [,3]     [,4]
# [1,] 12452329.0 5895577  1726433  3822220
# [2,]   309802.8 7994185 12181477 13296825

It should be straight-forward to find which city is closest to each coordinate, with

apply(dists, 1, which.min)
# [1] 3 1

That is, the first point is closest to the third city, and the second point is closest to the first city.

Just to prove this is a tenable solution for a large number pairs, here's the same problem scaled up a bit.

worldcities_big <- do.call(rbind, replicate(250, worldcities, simplify = FALSE))
nrow(worldcities_big)
# [1] 1000
coords_big <- do.call(rbind, replicate(5000, coords, simplify = FALSE))
nrow(coords_big)
# [1] 10000
system.time(
  dists <- outer(seq_len(nrow(coords_big)), seq_len(nrow(worldcities_big)),
                 function(i, j) geosphere::distHaversine(coords_big[i,2:1], worldcities_big[j,2:1]))
)
#    user  system elapsed 
#   67.62    2.22   70.03 

So yes, it was not instantaneous, but 70 seconds is not horrible for 10,000,000 distance calculations. Could you make it faster? Perhaps, not sure precisely how, easily. I'd think some heuristics might reduce it to O(m*log(n)) from O(m*n) time, but I don't know if that's worth the coding complexity it'll introduce.



来源:https://stackoverflow.com/questions/60049868/how-can-i-calculate-distance-between-multiple-latitude-and-longitude-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!