calculating the euclidean dist between each row of a dataframe with all other rows in another dataframe

北城以北 提交于 2019-12-01 03:53:37

Expanding on my comment on the question, a pretty fast approach would be the following, although with 40,000 rows you'll have to wait a bit, I guess:

unlist(lapply(seq_len(nrow(y)), function(i) min(sqrt(colSums((y[i, ] - t(x))^2)))))
#[1] 5.196152 5.385165 4.898979 4.898979 5.385165

And a comparing benchmarking:

x = matrix(runif(1e2*5), 1e2)
y = matrix(runif(1e2*5), 1e2)
library(microbenchmark)
alex = function() unlist(lapply(seq_len(nrow(y)), 
                           function(i) min(sqrt(colSums((y[i, ] - t(x))^2)))))
jlhoward = function() apply(y,1,function(y)
                                  min(apply(x,1,function(x,y)dist(rbind(x,y)),y)))
all.equal(alex(), jlhoward())
#[1] TRUE
microbenchmark(alex(), jlhoward(), times = 20)
#Unit: milliseconds
#       expr        min         lq     median         uq        max neval
#     alex()   3.369188   3.479011   3.600354   4.513114   4.789592    20
# jlhoward() 422.198621 431.565643 436.561057 442.643181 602.929742    20

Try this:

apply(y,1,function(y) min(apply(x,1,function(x,y)dist(rbind(x,y)),y)))
# [1] 5.196152 5.385165 4.898979 4.898979 5.385165

Working from the inside out, we bind a row of x to a row of y and calcualte the distance between them usin the dist(...) function (written in C). We do this for a given row of y with each row of x in turn, using the inner apply(...), and then find the minimum of the result. Then we do this for each row of y in the outer call to apply(...).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!