calculate distance between each pair of coordinates in wide dataframe

浪子不回头ぞ 提交于 2019-12-03 16:16:57

The problem you're having is thatapply(...) coerces the first argument to a matrix. By definition, a matrix must have all elements of the same data type. Since one of the columns in dat (dat$subcounty) is char, apply(...) coerces everything to char. In your test dataset, everything was numeric, so you didn't have this problem.

This should work:

dat$dist.km <- sapply(1:nrow(dat),function(i)
                spDistsN1(as.matrix(dat[i,3:4]),as.matrix(dat[i,5:6]),longlat=T))

There is a much faster solution using data.table and geosphere.

library(data.table)
library(geosphere)

setDT(dat)[ , dist_km := distGeo(matrix(c(pro.long, pro.lat), ncol = 2), 
                                  matrix(c(sub.long, sub.lat), ncol = 2))/1000] 

Benchmark:

library(sp)

jlhoward <- function(dat) { dat$dist.km <- sapply(1:nrow(dat),function(i)
                             spDistsN1(as.matrix(dat[i,3:4]),as.matrix(dat[i,5:6]),longlat=T)) }

rafa.pereira <- function(dat2) { setDT(dat2)[ , dist_km := distGeo(matrix(c(pro.long, pro.lat), ncol = 2), 
                                                                 matrix(c(sub.long, sub.lat), ncol = 2))/1000] }


> system.time( jlhoward(dat) )
   user  system elapsed 
   8.94    0.00    8.94 

> system.time( rafa.pereira(dat) )
   user  system elapsed 
   0.07    0.00    0.08 

Data

dat <- structure(list(ID = 1:4, 
                      subcounty = c("a", "b", "c", "d"), 
                      pro.long = c(33.47627919, 31.73605491, 31.54073482, 31.51748984), 
                      pro.lat = c(2.73996953, 3.26530095, 3.21327597, 3.17784981), 
                      sub.long = c(33.47552, 31.78307, 31.53083, 31.53083), 
                      sub.lat = c(2.740362, 3.391209, 3.208736, 3.208736)), 
                 .Names = c("ID", "subcounty", "pro.long", "pro.lat", "sub.long", "sub.lat"),     
                 row.names = c(NA, 4L), class = "data.frame")

# enlarge dataset to 40,000 pairs
dat <- dat[rep(seq_len(nrow(dat)), 10000), ]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!