Finding closest matching time for each patient

淺唱寂寞╮ 提交于 2019-11-30 22:05:06

I'd use data.table, first cleaning up by converting to ITime and ignoring redundant rows:

library(data.table)
setDT(data1)[, arrival := as.ITime(as.character(arrival))]
setDT(data2)[, availableSlot := as.ITime(as.character(availableSlot))]
DT1 = unique(data1, by="patient", fromLast=TRUE)

Then you can do a "rolling join":

res = data2[DT1, on=.(patient, availableSlot = arrival), roll="nearest", 
  .(patient, availableSlot = x.availableSlot)]

#    patient availableSlot
# 1:       A      11:15:00
# 2:       B      12:55:00
# 3:       C      14:00:00

How it works

The syntax is x[i, on=, roll=, j].

  • on= are the merge-by columns.
  • It's a join: for each row of i, we are looking for matches in x.
  • With roll="nearest", the final column in the on= is "rolled" to its nearest match.
  • The on= columns in the original tables can be referenced with x.* and i.* prefixes.
  • The j argument should give a list of columns, and .() is an alias for list() here.

Check out the package's introductory materials at http://r-datatable.com/Getting-started and type ?data.table for the docs relevant to rolling joins.


I would stop at res, but if you really want it back in your original table...

# a very nonstandard step:
data1[lastRow == "Yes", availableSlot := res$availableSlot ]

#    patient  arrival lastRow availableSlot
# 1:       A 11:00:00                  <NA>
# 2:       A 11:00:00     Yes      11:15:00
# 3:       B 13:00:00                  <NA>
# 4:       B 13:00:00     Yes      12:55:00
# 5:       C 14:00:00                  <NA>
# 6:       C 14:00:00                  <NA>
# 7:       C 14:00:00                  <NA>
# 8:       C 14:00:00     Yes      14:00:00

Now, data1 has availableSlot in a new column, similar to when you do data1$col <- val.

d.b

Here is a solution (based on joel.wilson's answer to my question) that will work with base R

#Convert dates to POSIXct format
data1$arrival = as.POSIXct(data1$arrival, format = "%H:%M")
data2$availableSlot = as.POSIXct(data2$availableSlot, format = "%H:%M")

#Lookup times from data2$availableSlot closest to data1$arrival
data1$availableSlot = sapply(data1$arrival, function(x)
                    data2$availableSlot[which.min(abs(x - data2$availableSlot))])

#Keep just hour and minutes
data1$availableSlot = strftime(as.POSIXct(data1$availableSlot, 
                                origin = "1970-01-01"), format = "%H:%M")
data1$arrival = strftime(as.POSIXct(data1$arrival), format = "%H:%M")

#Remove times when lastrow is empty
data1$availableSlot[which(data1$lastRow != "Yes")] = ""
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!