I have a large (150,000x7) dataframe that I intend to use for back-testing and real-time analysis of a financial market. The data represents the condition of an investment v
1) If DF
is the data frame shown in the question then create a zoo object from it as you have done and split it into days giving zs
. Then lapply
your function f
to each successive set of w
points in each component (i.e. in each day). For example, if you want to apply your function to 2 hours of data at a time and your data is regularly spaced 5 minute data then w = 24 (since there are 24 five minute periods in two hours). In such a case f
would be passed 24 rows of data as a matrix each time its called. Also align
has been set to "right"
below but it can alternately be set to align="center"
and the condition giving ix
can be changed to double sided, etc. For more on rollapply
see: ?rollapply
library(zoo)
z <- zoo(DF[-2], as.POSIXct(DF[,1], origin = "1970-01-01"))
w <- 3 # replace this with 24 to handle two hours at a time with five min data
f <- function(x) {
tt <- x[, 1]
ix <- tt[w] - tt <= w * 5 * 60 # RHS converts w to seconds
x <- x[ix, -1]
sum(x) # replace sum with your function
}
out <- rollapply(z, w, f, by.column = FALSE, align = "right")
Using the data frame in the question we get this:
> out
$`2008-05-30`
2008-05-30 02:00:00 2008-05-30 02:05:00 2008-05-30 02:10:00 2008-05-30 02:15:00
-66.04703 -83.92148 -95.93558 -100.24924
2008-05-30 02:20:00 2008-05-30 02:25:00 2008-05-30 02:30:00 2008-05-30 02:35:00
-108.15038 -121.24519 -134.39873 -140.28436
By the way, be sure to read this post .
2) This could alternately be done as the following where w
and f
are as above:
n <- nrow(DF)
m <- as.matrix(DF[-2])
sapply(w:n, function(i) { m <- m[seq(length = w, to = i), ]; f(m) })
Replace the sapply
with lapply
if needed. Also this may seem shorter than the first solution but its not much different once you add the code to define f
and w
(which appear in the first but not the second).
If there are no holes during the day and only holes between days then these solutions could be simplified.