Creating variable in R data frame depending on another data frame

前端未结

关注

 4  714

后悔当初 2020-12-31 20:34

I am seeking help after having wasted almost a day. I have a big data frame (bdf) and a small data frame (sdf). I want to add variable z to bdf depending on the value of sdf

4条回答

暗喜 (楼主)

2020-12-31 21:07
Here's a solution using data.table's rolling joins:
```
require(data.table)
setkey(setDT(sdf), ts)
sdf[bdf, roll = "nearest"]
#                      ts    y
#  1: 2013-05-19 17:11:22  0.2
#  2: 2013-05-21 06:40:58  0.2
#  3: 2013-05-22 20:10:34  0.2
#  4: 2013-05-24 09:40:10 -0.1
#  5: 2013-05-25 23:09:46 -0.1
#  6: 2013-05-27 12:39:22  0.3
#  7: 2013-05-29 02:08:58  0.3
#  8: 2013-05-30 15:38:34  0.3
#  9: 2013-06-01 05:08:10  0.3
# 10: 2013-06-02 18:37:46  0.3
```
- setDT converts data.frame to data.table by reference.
- setkey sorts the data.table by reference in increasing order by the columns provided, and marks those columns as key columns (so that we can join on those key columns later.
- In data.table, x[i] performs a join when i is a data.table. I'll refer you to this answer to catch up on data.table joins, if you're not already familiar with.
- x[i] performs an equi-join. That is, it finds matching row indices in x for every row in i and then extracts those rows from x to return the join result along with the corresponding row from i. In case a row in i doesn't find matching row indices in x, that row would have NA for x by default.
  
  However, x[i, roll = .] performs a rolling join. When there's no match, either the last observation is carried forward (roll = TRUE or -Inf), or the next observation can be carried backward (roll = Inf), or rolled to the nearest value (roll = "nearest"). And in this case you require roll = "nearest" IIUC.
HTH
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...