How to do the below (straightforward using sqldf) using data.table and get exact same result:
library(data.table)
whatWasMeasured <- data.table(start=as.POSIXct(seq(1, 1000, 100),
origin="1970-01-01 00:00:00"),
end=as.POSIXct(seq(10, 1000, 100), origin="1970-01-01 00:00:00"),
x=1:10,
y=letters[1:10])
measurments <- data.table(time=as.POSIXct(seq(1, 2000, 1),
origin="1970-01-01 00:00:00"),
temp=runif(2000, 10, 100))
## Alternative short names for data.tables
dt1 <- whatWasMeasured
dt2 <- measurments
## Straightforward with sqldf
library(sqldf)
sqldf("select * from measurments m, whatWasMeasured wwm
where m.time between wwm.start and wwm.end")
Arun
You can use the foverlaps()
function which implements joins over intervals efficiently. In your case, we just need a dummy column for measurments
.
Note 1: You should install the development version of data.table -
v1.9.5
as a bug withfoverlaps()
has been fixed there. You can find the installation instructions here.Note 2: I'll call
whatWasMeasured
=dt1
andmeasurments
=dt2
here for convenience.
require(data.table) ## 1.9.5+
dt2[, dummy := time]
setkey(dt1, start, end)
ans = foverlaps(dt2, dt1, by.x=c("time", "dummy"), nomatch=0L)[, dummy := NULL]
See ?foverlaps
for more info and this post for a performance comparison.
来源:https://stackoverflow.com/questions/27487949/how-to-perform-join-over-date-ranges-using-data-table