问题
I am working with a dataset that has temperature readings once an hour, 24 hrs a day for 100+ years. I want to get an average temperature for each day to reduce the size of my dataset. The headings look like this:
YR MO DA HR MN TEMP
1943 6 19 10 0 73
1943 6 19 11 0 72
1943 6 19 12 0 76
1943 6 19 13 0 78
1943 6 19 14 0 81
1943 6 19 15 0 85
1943 6 19 16 0 85
1943 6 19 17 0 86
1943 6 19 18 0 86
1943 6 19 19 0 87
etc for 600,000+ data points.
How can I run a nested function to calculate daily average temperature so i preserve the YR, MO, DA, TEMP? Once I have this, I want to be able to look at long term averages & calculate say the average temperature for the Month of January across 30 years. How do I do this?
回答1:
In one step you could do this:
meanTbl <- with(datfrm, tapply(TEMP, ISOdate(YR, MO, DA), mean) )
This gives you a date-time formatted index as well as the values. If you wanted just the Date as character without the trailing time:
meanTbl <- with(dat, tapply(TEMP, as.Date(ISOdate(YR, MO, DA)), mean) )
The monthly averages could be done with:
monMeans <- with(meanTbl, tapply(TEMP, MO, mean))
回答2:
You can do it with aggregate
:
# daily means
aggregate(TEMP ~ YR + MO + DA, FUN=mean, data=data)
# monthly means
aggregate(TEMP ~ YR + MO, FUN=mean, data=data)
# yearly means
aggregate(TEMP ~ YR, FUN=mean, data=data)
# monthly means independent of year
aggregate(TEMP ~ MO, FUN=mean, data=data)
回答3:
Your first question can be achieved using the plyr
package:
library(plyr)
daily_mean = ddply(df, .(YR, MO, DA), summarise, mean_temp = mean(TEMP))
In analogy to the above solution, to get monthly means:
monthly_mean = ddply(df, .(YR, MO), summarise, mean_temp = mean(temp))
or to get monthly averages over the whole dataset (30 years, aka normals in climate), not per year:
monthly_mean_normals = ddply(df, .(MO), summarise, mean_temp = mean(temp))
来源:https://stackoverflow.com/questions/15105670/how-to-calculate-average-values-large-datasets