Creating variable in R data frame depending on another data frame

前端 未结 4 699
后悔当初
后悔当初 2020-12-31 20:34

I am seeking help after having wasted almost a day. I have a big data frame (bdf) and a small data frame (sdf). I want to add variable z to bdf depending on the value of sdf

4条回答
  •  既然无缘
    2020-12-31 21:07

    Here's my approach:

    library(zoo)
    m <- c(rollmean(as.POSIXct(sdf$ts), 2), Inf)
    transform(bdf, z = sdf$y[sapply(tb, function(x) which.max(x < m))])
    #                    tb    z
    #1  2013-05-19 17:11:22  0.2
    #2  2013-05-21 06:40:58  0.2
    #3  2013-05-22 20:10:34  0.2
    #4  2013-05-24 09:40:10 -0.1
    #5  2013-05-25 23:09:46 -0.1
    #6  2013-05-27 12:39:22  0.3
    #7  2013-05-29 02:08:58  0.3
    #8  2013-05-30 15:38:34  0.3
    #9  2013-06-01 05:08:10  0.3
    #10 2013-06-02 18:37:46  0.3
    

    Update: removed conversion to numeric (not required)

    Brief explanation:

    • as.POSIXct(sdf$ts) converts the dates to POSIXct-style date-times
    • rollmean(as.POSIXct(sdf$ts), 2) computes the rolling mean of each two consecutive rows. This happens to be exactly the time you want to use for separating the observations. rollmean is from package zoo. Computing a rollmean(..,2) means the output vector is shortened by 1 compared to the input vector.
    • That is why I wrap the result of rollmean in c(.., Inf) which means that the infinity value is added to the rollmean vector as the last value. This will ensure that the last entries of z in sdf are also returned (0.3 in the specific example).
    • I use transform to add the z column to bdf
    • sapply(tb, function(x) which.max(x < m)) loops through the entries in bdf$tb and for each entry, computes the maximum index for which bdf$tb is less (earlier) than m (which holds the vector of rollmean entries). Only the maximum (latest) index is returned for each bdf$tb entry.
    • That vector of indices is used in sdf$y[sapply(tb, function(x) which.max(x < m))] to extract the corresponding elements of sdf$y which will then be stored/copied to the new z column in bdf

    Hope that helps

提交回复
热议问题