I am seeking help after having wasted almost a day. I have a big data frame (bdf) and a small data frame (sdf). I want to add variable z to bdf depending on the value of sdf
Here's a solution using data.table's rolling joins:
require(data.table)
setkey(setDT(sdf), ts)
sdf[bdf, roll = "nearest"]
# ts y
# 1: 2013-05-19 17:11:22 0.2
# 2: 2013-05-21 06:40:58 0.2
# 3: 2013-05-22 20:10:34 0.2
# 4: 2013-05-24 09:40:10 -0.1
# 5: 2013-05-25 23:09:46 -0.1
# 6: 2013-05-27 12:39:22 0.3
# 7: 2013-05-29 02:08:58 0.3
# 8: 2013-05-30 15:38:34 0.3
# 9: 2013-06-01 05:08:10 0.3
# 10: 2013-06-02 18:37:46 0.3
setDT converts data.frame to data.table by reference.
setkey sorts the data.table by reference in increasing order by the columns provided, and marks those columns as key columns (so that we can join on those key columns later.
In data.table, x[i] performs a join when i is a data.table. I'll refer you to this answer to catch up on data.table joins, if you're not already familiar with.
x[i] performs an equi-join. That is, it finds matching row indices in x for every row in i and then extracts those rows from x to return the join result along with the corresponding row from i. In case a row in i doesn't find matching row indices in x, that row would have NA for x by default.
However, x[i, roll = .] performs a rolling join. When there's no match, either the last observation is carried forward (roll = TRUE or -Inf), or the next observation can be carried backward (roll = Inf), or rolled to the nearest value (roll = "nearest"). And in this case you require roll = "nearest" IIUC.
HTH