R calculate distance based on latitude-longitude from two data frames

风格不统一 提交于 2021-02-08 11:20:13

问题


I am trying to substitute values in a data frame from values in another data frame based on a condition.

Both data contain latitude, longitude and height but one of them is shorter. I want to pick any point from the shorter data frame (5103 rows) , find the closest values on latitude and longitude (by calculating distance) on the second one (188426 rows) and then replace the height value on the longest data frame with the height on the shorter one.

The first data frame is topo.rams in the below code and the second is topo.msg. The final purpose is to substitute height in topo.msg with height values in topo.rams

topo.rams:
longitud,latitud,tempc,u,v,w,relhum,speed,topo
-1.7107, 38.1464, 18.2412, -6.1744, -0.3708, 0.0000, 58.6447, 6.3584,460.5908
-1.7107, 38.1734, 18.5915, -5.7757, -0.3165, 0.0000, 61.8492, 5.9840,416.0403

topo.msg
height,longitud,latitud
448.0, 1.70, 38.14
402.0, 1.70, 38.18

and the desired output (topo.msg modified)

height,longitud,latitud
460.5908, 1.70, 38.14
416.0403,  1.70, 38.18

and the code used

#lectura de datos
topo.msg=read.csv("MSG_DEM.txt",sep=",",header=FALSE)
colnames(topo.msg) <- c("topoMSG","longitud","latitud")

topo.rams=read.csv("topografia-rams.txt",sep=",",header=TRUE)

# número de estaciones a tratar
puntos.rams=dim(topo.rams)[1]
puntos.msg=dim(topo.msg)[1]

# Localización del punto de MSG más próximo a la estación.
# Se calcula la distancia a partir de las coordenadas lat-lon

topo.temp=data.frame()

for(i in 1:puntos.rams)
{
  for(j in 1:puntos.msg) 
  {
  dlon<-topo.rams$longitud[i]-topo.msg$longitud

  if ( dlon < 0.5 && dlat < 0.5) {

    dlat<-topo.rams$latitud[i]-topo.msg$latitud

    if ( dlat < 0.5) {
       n1<-n1+1
       distancia=sqrt(dlon*dlon+dlat*dlat)

      }
    }
  indexj=which.min(distancia)
  }

  topo.msg$topo[indexj] = topo.rams$topo[i]

}

This code seems to run but it takes a very long time. I have also tried to create a distance matrix with geosphere package from the post in Geographic distance between 2 lists of lat/lon coordinates But R complaints about allocating a 3.6 Gb.

How can I adress this issue? I would like to optimize the loop or to use distance matrix. For sure there has to be a cleaner, more efficient way to calculate distances.

Thanks in advance


回答1:


From the comment by Patric I switched from loop to matrix/vector computation. Now the code is running, simpler and more efficient.

for(i in 1:puntos.rams) 
{
  dlon<-topo.rams$longitud[i]-topo.msg$longitud
  dlat<-topo.rams$latitud[i]-topo.msg$latitud
  distancia<-matrix(sqrt(dlon*dlon+dlat*dlat))
  indexj=which.min(distancia)
  topo.temp$topo[indexj] = topo.rams$topo[i]
}

There's probably a more elegant way to do this calculation. I would appreciate any input.



来源:https://stackoverflow.com/questions/34198665/r-calculate-distance-based-on-latitude-longitude-from-two-data-frames

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!