How to calculate average values large datasets

问题

I am working with a dataset that has temperature readings once an hour, 24 hrs a day for 100+ years. I want to get an average temperature for each day to reduce the size of my dataset. The headings look like this:

     YR MO DA HR MN TEMP
  1943  6 19 10  0   73
  1943  6 19 11  0   72
  1943  6 19 12  0   76
  1943  6 19 13  0   78
  1943  6 19 14  0   81
  1943  6 19 15  0   85
  1943  6 19 16  0   85
  1943  6 19 17  0   86
  1943  6 19 18  0   86
  1943  6 19 19  0   87

etc for 600,000+ data points.

How can I run a nested function to calculate daily average temperature so i preserve the YR, MO, DA, TEMP? Once I have this, I want to be able to look at long term averages & calculate say the average temperature for the Month of January across 30 years. How do I do this?

回答1:

In one step you could do this:

 meanTbl <- with(datfrm, tapply(TEMP, ISOdate(YR, MO, DA), mean) )

This gives you a date-time formatted index as well as the values. If you wanted just the Date as character without the trailing time:

meanTbl <- with(dat, tapply(TEMP, as.Date(ISOdate(YR, MO, DA)), mean) )

The monthly averages could be done with:

 monMeans <- with(meanTbl, tapply(TEMP, MO, mean))

回答2:

You can do it with aggregate:

# daily means
aggregate(TEMP ~ YR + MO + DA, FUN=mean, data=data) 

# monthly means 
aggregate(TEMP ~ YR + MO, FUN=mean, data=data)

# yearly means
aggregate(TEMP ~ YR, FUN=mean, data=data)

# monthly means independent of year
aggregate(TEMP ~ MO, FUN=mean, data=data)

回答3:

Your first question can be achieved using the plyr package:

library(plyr)
daily_mean = ddply(df, .(YR, MO, DA), summarise, mean_temp = mean(TEMP))

In analogy to the above solution, to get monthly means:

monthly_mean = ddply(df, .(YR, MO), summarise, mean_temp = mean(temp))

or to get monthly averages over the whole dataset (30 years, aka normals in climate), not per year:

monthly_mean_normals = ddply(df, .(MO), summarise, mean_temp = mean(temp))

来源：https://stackoverflow.com/questions/15105670/how-to-calculate-average-values-large-datasets

标签

time-series

average

plyr