calculation of anomalies on time-series

问题

I'd like to calculate monthly temperature anomalies on a time-series with several stations. I call here "anomaly" the difference of a single value from a mean calculated on a period.

My data frame looks like this (let's call it "data"):

Station Year Month Temp
A 1950 1 15.6
A 1980 1 12.3
A 1990 2 11.4
A 1950 1 15.6
B 1970 1 12.3
B 1977 2 11.4
B 1977 4 18.6
B 1980 1 12.3
B 1990 11 7.4

First, I made a subset with the years comprised between 1980 and 1990:

data2 <- subset(data, Year>=1980& Year<=1990)

Second, I used plyr to calculate monthly mean (let's call this "MeanBase") between 1980 and 1990 for each station:

data3 <- ddply(data2, .(Station, Month), summarise,
               MeanBase = mean(Temp, na.rm=TRUE))

Now, I'd like to calculate, for each line of data, the difference between the corresponding MeanBase and the value of Temp... but I'm not sure to be in the right way (I don't see how to use data3).

回答1:

You can use ave in base R to get this.

transform(data, 
          Demeaned=Temp - ave(replace(Temp, Year < 1980 | Year > 1990, NA), 
                              Station, Month, FUN=function(t) mean(t, na.rm=TRUE)))

#   Station Year Month Temp Demeaned
# 1       A 1950     1 15.6 3.3
# 2       A 1980     1 12.3 0.0
# 3       A 1990     2 11.4 0.0
# 4       A 1950     1 15.6 3.3
# 5       B 1970     1 12.3 0.0
# 6       B 1977     2 11.4 NaN
# 7       B 1977     4 18.6 NaN
# 8       B 1980     1 12.3 0.0
# 9       B 1990    11  7.4 0.0

The result column will have NaNs for Month-Station combinations that have no years in your specified range.

来源：https://stackoverflow.com/questions/16420267/calculation-of-anomalies-on-time-series

标签

time-series

plyr