Creating variable in R data frame depending on another data frame

前端 未结 4 714
后悔当初
后悔当初 2020-12-31 20:34

I am seeking help after having wasted almost a day. I have a big data frame (bdf) and a small data frame (sdf). I want to add variable z to bdf depending on the value of sdf

4条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-31 21:07

    Here's a solution using data.table's rolling joins:

    require(data.table)
    setkey(setDT(sdf), ts)
    sdf[bdf, roll = "nearest"]
    #                      ts    y
    #  1: 2013-05-19 17:11:22  0.2
    #  2: 2013-05-21 06:40:58  0.2
    #  3: 2013-05-22 20:10:34  0.2
    #  4: 2013-05-24 09:40:10 -0.1
    #  5: 2013-05-25 23:09:46 -0.1
    #  6: 2013-05-27 12:39:22  0.3
    #  7: 2013-05-29 02:08:58  0.3
    #  8: 2013-05-30 15:38:34  0.3
    #  9: 2013-06-01 05:08:10  0.3
    # 10: 2013-06-02 18:37:46  0.3
    
    • setDT converts data.frame to data.table by reference.

    • setkey sorts the data.table by reference in increasing order by the columns provided, and marks those columns as key columns (so that we can join on those key columns later.

    • In data.table, x[i] performs a join when i is a data.table. I'll refer you to this answer to catch up on data.table joins, if you're not already familiar with.

    • x[i] performs an equi-join. That is, it finds matching row indices in x for every row in i and then extracts those rows from x to return the join result along with the corresponding row from i. In case a row in i doesn't find matching row indices in x, that row would have NA for x by default.

      However, x[i, roll = .] performs a rolling join. When there's no match, either the last observation is carried forward (roll = TRUE or -Inf), or the next observation can be carried backward (roll = Inf), or rolled to the nearest value (roll = "nearest"). And in this case you require roll = "nearest" IIUC.

    HTH

提交回复
热议问题