I have a dataset filled with the average windspeed per hour for multiple years. I would like to create an \'average year\', in which for each hour the average windspeed for
I predict that ddply and the plyr package are going to be your best friend :). I created a 30 year dataset with hourly random windspeeds between 1 and 10 ms:
begin_date = as.POSIXlt("1990-01-01", tz = "GMT")
# 30 year dataset
dat = data.frame(dt = begin_date + (0:(24*30*365)) * (3600))
dat = within(dat, {
speed = runif(length(dt), 1, 10)
unique_day = strftime(dt, "%d-%m")
})
> head(dat)
dt unique_day speed
1 1990-01-01 00:00:00 01-01 7.054124
2 1990-01-01 01:00:00 01-01 2.202591
3 1990-01-01 02:00:00 01-01 4.111633
4 1990-01-01 03:00:00 01-01 2.687808
5 1990-01-01 04:00:00 01-01 8.643168
6 1990-01-01 05:00:00 01-01 5.499421
To calculate the daily normalen (30 year average, this term is much used in meteorology) over this 30 year period:
library(plyr)
res = ddply(dat, .(unique_day),
summarise, mean_speed = mean(speed), .progress = "text")
> head(res)
unique_day mean_speed
1 01-01 5.314061
2 01-02 5.677753
3 01-03 5.395054
4 01-04 5.236488
5 01-05 5.436896
6 01-06 5.544966
This takes just a few seconds on my humble two core AMD, so I suspect just going once through the data is not needed. Multiple of these ddply
calls for different aggregations (month, season etc) can be done separately.