Find the minimum distance between two data frames, for each element in the second data frame

我的梦境 提交于 2019-11-30 10:01:45
Arun

Here's how I'd do it using data.table:

require(data.table)
setkey(setDT(ev1), test_id)
ev1[ev2, .(ev2.time = i.time, ev1.time = time[which.min(abs(i.time - time))]), by = .EACHI]
#    test_id ev2.time ev1.time
# 1:       0        6        3
# 2:       0        1        1
# 3:       0        8        3
# 4:       1        4        4
# 5:       1        5        4
# 6:       1       11        4

In joins of the form x[i] in data.table, the prefix i. is used to refer the columns in i, when both x and i share the same name for a particular column.

Please see this SO post for an explanation on how this works.

This is syntactically more straightforward to understand what's going on, and is memory efficient (at the expense of little speed1) as it doesn't materialise the entire join result at all. In fact, this does exactly what you say in your post - filter on the fly, while merging.

  1. On speed, it doesn't matter in most of the cases really. If there are a lot of rows in i, it might be a tad slower as the j-expression will have to be evaluated for each row in i. In contrast, @akrun's answer does a cartesian join followed by one filtering. So while it's high on memory, it doesn't evaluate j for each row in i. But again, this shouldn't even matter unless you work with really large i which is not often the case.

HTH

May be this helps:

library(data.table)
setkey(setDT(ev1), test_id)
DT <- ev1[ev2, allow.cartesian=TRUE][,distance:=time-i.time]
DT[DT[,abs(distance)==min(abs(distance)), by=list(test_id, i.time)]$V1]
#    test_id time i.time distance
#1:       0    3      6        3
#2:       0    1      1        0
#3:       0    3      8        5
#4:       1    4      4        0
#5:       1    4      5        1
#6:       1    4     11        7

Or

 ev1[ev2, allow.cartesian=TRUE][,distance:= time-i.time][,
      .SD[abs(distance)==min(abs(distance))], by=list(test_id, i.time)]

Update

Using the new grouping

setkey(setDT(ev1), test_id, group_id)
setkey(setDT(ev2), test_id, group_id)
DT <- ev1[ev2, allow.cartesian=TRUE][,distance:=i.time-time]
DT[DT[,abs(distance)==min(abs(distance)), by=list(test_id, 
                                group_id,i.time)]$V1]$distance
#[1]  2  3  4 -1  0  4

Based on the code you provided

min_data$distance
#[1]  2  3  4 -1  0  4
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!