I have two dataframes x and y that contain columns for ids and for dates.
id.x <- c(1, 2, 4, 5, 7, 8, 10)
date.x <- as.Date(c(\"2015-01-01\", \"201
Using the development version of data.table, v1.9.7, where non-equi (or conditional) joins was recently implemented, we can do this in a straightforward (and efficient) manner.. See installation instructions here.
require(data.table) # v1.9.7+
setDT(x)
setDT(y) ## convert both data.frames to data.tables by reference
x[, date.x.plus3 := date.x + 3L]
y[x, .(id.x, date.x, date.y=x.date.y),
on=.(id.y == id.x, date.y >= date.x, date.y <= date.x.plus3)]
# id.x date.x date.y
# 1: 1 2015-01-01 2015-01-03
# 2: 2 2015-01-02
# 3: 4 2015-01-21
# 4: 5 2015-01-13
# 5: 7 2015-01-29 2015-01-29
# 6: 8 2015-01-01
# 7: 10 2015-01-03
Solutions that join on a dummy column and then filter based on the conditions are generally not scalable (as the number of rows quickly explode), and solutions that loop through rows and run the filtering condition for each row are slow, well, because they perform the operation row-wise.
This solution does neither, i.e., performs the conditional join directly, and therefore should be performant both in terms of runtime and memory.