I am seeking help after having wasted almost a day. I have a big data frame (bdf) and a small data frame (sdf). I want to add variable z to bdf depending on the value of sdf
Here's my approach:
library(zoo)
m <- c(rollmean(as.POSIXct(sdf$ts), 2), Inf)
transform(bdf, z = sdf$y[sapply(tb, function(x) which.max(x < m))])
# tb z
#1 2013-05-19 17:11:22 0.2
#2 2013-05-21 06:40:58 0.2
#3 2013-05-22 20:10:34 0.2
#4 2013-05-24 09:40:10 -0.1
#5 2013-05-25 23:09:46 -0.1
#6 2013-05-27 12:39:22 0.3
#7 2013-05-29 02:08:58 0.3
#8 2013-05-30 15:38:34 0.3
#9 2013-06-01 05:08:10 0.3
#10 2013-06-02 18:37:46 0.3
Update: removed conversion to numeric (not required)
Brief explanation:
as.POSIXct(sdf$ts)
converts the dates to POSIXct-style date-timesrollmean(as.POSIXct(sdf$ts), 2)
computes the rolling mean of each two consecutive rows. This happens to be exactly the time you want to use for separating the observations. rollmean
is from package zoo
. Computing a rollmean(..,2)
means the output vector is shortened by 1 compared to the input vector.rollmean
in c(.., Inf)
which means that the infinity value is added to the rollmean vector as the last value. This will ensure that the last entries of z
in sdf
are also returned (0.3 in the specific example).transform
to add the z
column to bdf
sapply(tb, function(x) which.max(x < m))
loops through the entries in bdf$tb
and for each entry, computes the maximum index for which bdf$tb
is less (earlier) than m
(which holds the vector of rollmean entries). Only the maximum (latest) index is returned for each bdf$tb
entry.sdf$y[sapply(tb, function(x) which.max(x < m))]
to extract the corresponding elements of sdf$y
which will then be stored/copied to the new z
column in bdf
Hope that helps