问题
Sorry if this has already been posted but I looked really hard and could not find anything.
I am working with monthly temperature observations for 30 years, comprising January 1960 to December 1989. It looks like this:
> head(df)
date temp
1 1960-01-01 22.92235
2 1960-02-01 23.07059
3 1960-03-01 23.10941
4 1960-04-01 20.78353
5 1960-05-01 17.45176
6 1960-06-01 17.31765
First, what I need to do is to average all januaries, februaries, marches and etc for the whole period.
Then, I would like to do the same for specific periods of time (3 years, 5 years, 10 years etc).
For example,
- The average of all jan, feb, mar etc between 1960 and 1964;
- The average of all jan, feb, mar etc between 1965 and 1969;
- and so on.
The final result would consist of month, period and temperature, something like this:
Month Period Temp
Jan 1960-1989 17
Feb 1960-1989 12
Mar 1960-1989 7
Apr 1960-1989 9
May 1960-1989 15
Jun 1960-1989 12
Jul 1960-1989 17
Aug 1960-1989 22
Sep 1960-1989 21
Oct 1960-1989 21
Nov 1960-1989 18
Dec 1960-1989 17
Jan 1960-1964 17
Feb 1960-1964 12
Mar 1960-1964 7
Apr 1960-1964 9
May 1960-1964 9
Jun 1960-1964 11
Jul 1960-1964 14
Aug 1960-1964 18
Sep 1960-1964 13
Oct 1960-1964 12
Nov 1960-1964 17
Dec 1960-1964 11
Any ideias on how to do this?
In case you find useful, here is one clone of my dataset:
df <- structure(list(date = structure(c(-3653, -3622, -3593, -3562,
-3532, -3501, -3471, -3440, -3409, -3379, -3348, -3318, -3287,
-3256, -3228, -3197, -3167, -3136, -3106, -3075, -3044, -3014,
-2983, -2953, -2922, -2891, -2863, -2832, -2802, -2771, -2741,
-2710, -2679, -2649, -2618, -2588, -2557, -2526, -2498, -2467,
-2437, -2406, -2376, -2345, -2314, -2284, -2253, -2223, -2192,
-2161, -2132, -2101, -2071, -2040, -2010, -1979, -1948, -1918,
-1887, -1857, -1826, -1795, -1767, -1736, -1706, -1675, -1645,
-1614, -1583, -1553, -1522, -1492, -1461, -1430, -1402, -1371,
-1341, -1310, -1280, -1249, -1218, -1188, -1157, -1127, -1096,
-1065, -1037, -1006, -976, -945, -915, -884, -853, -823, -792,
-762, -731, -700, -671, -640, -610, -579, -549, -518, -487, -457,
-426, -396, -365, -334, -306, -275, -245, -214, -184, -153, -122,
-92, -61, -31, 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304,
334, 365, 396, 424, 455, 485, 516, 546, 577, 608, 638, 669, 699,
730, 761, 790, 821, 851, 882, 912, 943, 974, 1004, 1035, 1065,
1096, 1127, 1155, 1186, 1216, 1247, 1277, 1308, 1339, 1369, 1400,
1430, 1461, 1492, 1520, 1551, 1581, 1612, 1642, 1673, 1704, 1734,
1765, 1795, 1826, 1857, 1885, 1916, 1946, 1977, 2007, 2038, 2069,
2099, 2130, 2160, 2191, 2222, 2251, 2282, 2312, 2343, 2373, 2404,
2435, 2465, 2496, 2526, 2557, 2588, 2616, 2647, 2677, 2708, 2738,
2769, 2800, 2830, 2861, 2891, 2922, 2953, 2981, 3012, 3042, 3073,
3103, 3134, 3165, 3195, 3226, 3256, 3287, 3318, 3346, 3377, 3407,
3438, 3468, 3499, 3530, 3560, 3591, 3621, 3652, 3683, 3712, 3743,
3773, 3804, 3834, 3865, 3896, 3926, 3957, 3987, 4018, 4049, 4077,
4108, 4138, 4169, 4199, 4230, 4261, 4291, 4322, 4352, 4383, 4414,
4442, 4473, 4503, 4534, 4564, 4595, 4626, 4656, 4687, 4717, 4748,
4779, 4807, 4838, 4868, 4899, 4929, 4960, 4991, 5021, 5052, 5082,
5113, 5144, 5173, 5204, 5234, 5265, 5295, 5326, 5357, 5387, 5418,
5448, 5479, 5510, 5538, 5569, 5599, 5630, 5660, 5691, 5722, 5752,
5783, 5813, 5844, 5875, 5903, 5934, 5964, 5995, 6025, 6056, 6087,
6117, 6148, 6178, 6209, 6240, 6268, 6299, 6329, 6360, 6390, 6421,
6452, 6482, 6513, 6543, 6574, 6605, 6634, 6665, 6695, 6726, 6756,
6787, 6818, 6848, 6879, 6909, 6940, 6971, 6999, 7030, 7060, 7091,
7121, 7152, 7183, 7213, 7244, 7274), class = "Date"), temp = c(22.9223529411765,
23.0705882352941, 23.1094117647059, 20.7835294117647, 17.4517647058824,
17.3176470588235, 18.0494117647059, 19.6188235294118, 21.3023529411765,
23.1105882352941, 22.2364705882353, 22.7482352941176, 23.5870588235294,
24.0023529411765, 23.0094117647059, 22.0176470588235, 19.4917647058824,
18.1011764705882, 18.3164705882353, 20.0623529411765, 22.8717647058824,
23.2576470588235, 23.68, 22.3694117647059, 22.9517647058824,
23.6976470588235, 23.3294117647059, 20.8564705882353, 18.16,
15.8988235294118, 15.7988235294118, 18.4176470588235, 20.8423529411765,
20.3247058823529, 22.3070588235294, 22.2035294117647, 24.2235294117647,
23.6976470588235, 24.4082352941176, 21.1752941176471, 18.1023529411765,
16.1211764705882, 18.3164705882353, 19.7635294117647, 23.1294117647059,
22.9964705882353, 23.6552941176471, 22.6964705882353, 23.6011764705882,
23.6517647058824, 23.7035294117647, 22.4352941176471, 18.5835294117647,
16.5976470588235, 15.7741176470588, 19.2541176470588, 20.8776470588235,
20.5729411764706, 21.1729411764706, 21.5870588235294, 22.4576470588235,
23.6058823529412, 21.84, 21.6694117647059, 19.2458823529412,
18.7517647058824, 17.7811764705882, 19.4764705882353, 21.9270588235294,
21.5470588235294, 22.88, 23.2458823529412, 24.2776470588235,
25.2470588235294, 23.4694117647059, 21.4435294117647, 19.3941176470588,
18.5447058823529, 17.6, 18.3764705882353, 19.8529411764706, 22.0823529411765,
22.7294117647059, 23.4011764705882, 23.3611764705882, 24.2505882352941,
23.2870588235294, 21.9482352941176, 20.5552941176471, 18.0788235294118,
18.5929411764706, 20.8752941176471, 21.9023529411765, 23.6105882352941,
22.4070588235294, 21.5635294117647, 23.3129411764706, 22.9741176470588,
23.3670588235294, 19.6105882352941, 16.9941176470588, 17.7670588235294,
17.4858823529412, 17.8517647058824, 20.26, 22.1576470588235,
23.8364705882353, 23.4447058823529, 24.8129411764706, 25.1764705882353,
24.2694117647059, 21.5035294117647, 20.0458823529412, 18.4694117647059,
18.4541176470588, 19.5388235294118, 22.02, 20.5364705882353,
22.9858823529412, 21.9752941176471, 23.7729411764706, 24.0576470588235,
24.0941176470588, 22.1552941176471, 21.2329411764706, 19.5611764705882,
17.8788235294118, 18.6823529411765, 20.1541176470588, 21.6258823529412,
21.5211764705882, 23.9811764705882, 24.8352941176471, 24.5882352941176,
24.1729411764706, 21.1035294117647, 19.0435294117647, 17.08,
17.4529411764706, 19.1458823529412, 20.4447058823529, 20.7129411764706,
21.5047058823529, 22.6952941176471, 23.4364705882353, 23.1, 24.1847058823529,
19.8105882352941, 19.9847058823529, 20.5188235294118, 17.7658823529412,
19.4435294117647, 20.7588235294118, 21.7835294117647, 22.7788235294118,
23.2388235294118, 24.9129411764706, 25.6, 23.5647058823529, 24.0058823529412,
19.7823529411765, 19.3152941176471, 18.7741176470588, 19.0305882352941,
20.5576470588235, 21.3611764705882, 21.4247058823529, 23.4811764705882,
23.6505882352941, 25.1870588235294, 23.3541176470588, 21.4823529411765,
18.7364705882353, 17.7235294117647, 18.3976470588235, 19.7235294117647,
21.0741176470588, 21.6094117647059, 22.9635294117647, 22.4011764705882,
23.4152941176471, 24.7741176470588, 24.3270588235294, 20.7976470588235,
18.8764705882353, 17.7788235294118, 16.4129411764706, 21.4117647058824,
22.3317647058824, 21.66, 22.3694117647059, 23.0917647058824,
24.4541176470588, 23.2847058823529, 23.3164705882353, 21.2529411764706,
19.1258823529412, 17.3882352941176, 17.3823529411765, 19.0529411764706,
19.6576470588235, 20.2976470588235, 21.9023529411765, 23.3094117647059,
24.0117647058824, 25.5611764705882, 24.9129411764706, 21.3964705882353,
19.9870588235294, 18.3929411764706, 20.9917647058824, 20.3058823529412,
21.4435294117647, 23.1941176470588, 22.8388235294118, 22.5176470588235,
24.6317647058824, 24.6541176470588, 24.2, 20.84, 18.4576470588235,
17.5011764705882, 19.16, 20.54, 20.1517647058824, 22.6776470588235,
22.7470588235294, 22.7882352941176, 22.0811764705882, 24.2152941176471,
22.9235294117647, 20.8411764705882, 19.6188235294118, 17.16,
16.0529411764706, 20.3223529411765, 19.9752941176471, 22.5152941176471,
22.2705882352941, 23.1541176470588, 23.1047058823529, 23.9517647058824,
24.8176470588235, 22.18, 20.5023529411765, 17.3505882352941,
19.1917647058824, 19.9894117647059, 19.0235294117647, 22.8235294117647,
22.7094117647059, 23.8741176470588, 24.0517647058824, 25.1764705882353,
23.9235294117647, 21.2929411764706, 20.6117647058824, 17.1305882352941,
16.3470588235294, 19.6470588235294, 21.3341176470588, 20.2176470588235,
23.7435294117647, 22.6741176470588, 22.9070588235294, 24.7152941176471,
23.2905882352941, 20.5776470588235, 18.9635294117647, 19.0658823529412,
18.8423529411765, 20.0729411764706, 21.3047058823529, 22.1588235294118,
24.0388235294118, 22.1917647058824, 24.0517647058824, 24.8729411764706,
23.0117647058824, 23, 21.3094117647059, 19.4105882352941, 20.3470588235294,
19.4482352941176, 20.0670588235294, 21.6364705882353, 23.4211764705882,
23.16, 25.4788235294118, 26.4741176470588, 24.0482352941176,
21.4176470588235, 21.7164705882353, 19.0905882352941, 19.6752941176471,
18.1611764705882, 20.0482352941176, 23.4917647058824, 23.4894117647059,
22.5482352941176, 23.1376470588235, 24.9811764705882, 24.1552941176471,
22.8423529411765, 19.7435294117647, 16.4, 17.3105882352941, 20.5235294117647,
21.0494117647059, 23.1352941176471, 23.9435294117647, 23.9058823529412,
24.9835294117647, 24.6952941176471, 24.0047058823529, 23.3164705882353,
21.5823529411765, 18.3447058823529, 18.1964705882353, 20.0035294117647,
20.7152941176471, 22.5705882352941, 24.6541176470588, 23.2329411764706,
25.0517647058824, 24.3329411764706, 23.5811764705882, 22.9988235294118,
19.4976470588235, 17.3188235294118, 19.5635294117647, 19.0211764705882,
19.7223529411765, 22.6858823529412, 23.9423529411765, 23.6905882352941,
25.7129411764706, 23.9505882352941, 24.4376470588235, 22.6070588235294,
19.8882352941176, 17.2058823529412, 16.4211764705882, 20.02,
21.9458823529412, 21.9341176470588, 22.74, 23.8, 23.9611764705882,
24.4564705882353, 24, 23.2129411764706, 19.4729411764706, 17.7105882352941,
16.9682352941176, 19.0341176470588, 20.2917647058824, 20.7776470588235,
22.9364705882353, 22.7894117647059)), .Names = c("date", "temp"
), row.names = c(NA, -360L), class = "data.frame")
回答1:
One option would be to use data.table with year grouping by cut or findInterval. For the first case, ie. getting mean of each month aggregating over all the years, we convert the 'date' to Date class and extract the months, use it as grouping variable and get the mean of 'temp'. Note that we convert the 'data.frame' to 'data.table' first (setDT(df)).
library(data.table)
setDT(df)[, list(Temp=mean(temp)) , by = .(Months= months(as.Date(date), abbr=TRUE))]
# Months Temp
# 1: Jan 23.90506
# 2: Feb 24.40012
# 3: Mar 23.73714
# 4: Apr 21.68584
# 5: May 19.53863
# 6: Jun 17.90322
# 7: Jul 17.97675
# 8: Aug 19.56051
# 9: Sep 20.90125
#10: Oct 21.96886
#11: Nov 22.86102
#12: Dec 22.92537
For periodwise and monthly grouping, we need to create a period column. One way would be either cut or findInterval. For example, if we are looking for a 5 year window, ie. 1960-1964, 1965-1969, etc., we create the 'Period' column by creating the vec in findInterval using seq, change the numeric index derived from findInterval to 'lbl' created from paste. Use the 'Month' and 'Period' as grouping variable and the rest is same as before.
setDT(df)[, c('Month', 'Period') := {tmp <- as.Date(date)
tmp1 <- as.numeric(format(tmp, '%Y'))
tmp2 <- months(tmp, abbr=TRUE)
i1 <- seq(min(tmp1), max(tmp1)+4, by=5)
i2 <- i1+4
lbl <-paste(i1, i2, sep='-')
list(tmp2, lbl[findInterval(tmp1, i1)])
}]
df[, list(Temp= mean(temp)), .(Month, Period)]
# Month Period Temp
# 1: Jan 1960-1964 23.45718
# 2: Feb 1960-1964 23.62400
# 3: Mar 1960-1964 23.51200
# 4: Apr 1960-1964 21.45365
# 5: May 1960-1964 18.35788
# 6: Jun 1960-1964 16.80729
# 7: Jul 1960-1964 17.25106
# 8: Aug 1960-1964 19.42329
# 9: Sep 1960-1964 21.80471
#10: Oct 1960-1964 22.05247
#11: Nov 1960-1964 22.61035
#12: Dec 1960-1964 22.32094
#13: Jan 1965-1969 23.64447
#14: Feb 1965-1969 24.25082
#15: Mar 1965-1969 23.24659
#16: Apr 1965-1969 21.23506
#17: May 1965-1969 19.24706
#18: Jun 1965-1969 18.32235
#19: Jul 1965-1969 17.98282
#20: Aug 1965-1969 19.22376
#21: Sep 1965-1969 21.19247
#22: Oct 1965-1969 21.98682
#23: Nov 1965-1969 22.96776
#24: Dec 1965-1969 22.72612
#25: Jan 1970-1974 24.12165
#26: Feb 1970-1974 24.50659
#27: Mar 1970-1974 23.87412
#28: Apr 1970-1974 21.71153
#29: May 1970-1974 19.75600
#30: Jun 1970-1974 18.83976
#31: Jul 1970-1974 18.05388
#32: Aug 1970-1974 19.20518
#33: Sep 1970-1974 20.59788
#34: Oct 1970-1974 21.41859
#35: Nov 1970-1974 22.03859
#36: Dec 1970-1974 23.15953
#37: Jan 1975-1979 23.71882
#38: Feb 1975-1979 24.49788
#39: Mar 1975-1979 23.93600
#40: Apr 1975-1979 21.02565
#41: May 1975-1979 19.21318
#42: Jun 1975-1979 17.64424
#43: Jul 1975-1979 18.00000
#44: Aug 1975-1979 20.32659
#45: Sep 1975-1979 20.71200
#46: Oct 1975-1979 22.06894
#47: Nov 1975-1979 22.42565
#48: Dec 1975-1979 22.97224
#49: Jan 1980-1984 23.91882
#50: Feb 1980-1984 25.03812
#51: Mar 1980-1984 23.81835
#52: Apr 1980-1984 21.69365
#53: May 1980-1984 20.62071
#54: Jun 1980-1984 18.40965
#55: Jul 1980-1984 18.88071
#56: Aug 1980-1984 19.46376
#57: Sep 1980-1984 20.35553
#58: Oct 1980-1984 22.06565
#59: Nov 1980-1984 23.48047
#60: Dec 1980-1984 22.88965
#61: Jan 1985-1989 24.56941
#62: Feb 1985-1989 24.48329
#63: Mar 1985-1989 24.03576
#64: Apr 1985-1989 22.99553
#65: May 1985-1989 20.03694
#66: Jun 1985-1989 17.39600
#67: Jul 1985-1989 17.69200
#68: Aug 1985-1989 19.72047
#69: Sep 1985-1989 20.74494
#70: Oct 1985-1989 22.22071
#71: Nov 1985-1989 23.64329
#72: Dec 1985-1989 23.48376
# Month Period Temp
In the same way, we can get 10 year or other windows.
回答2:
The first part of question Average all januaries, februaries, you can get by -
monthly_data <- aggregate(df$temp,by=list(strftime(df$date, "%m")),mean)
cbind(monthly_data[2], Month = month.abb, Period = "1960-1989")
# x Month Period
# 1 23.90506 Jan 1960-1989
# 2 24.40012 Feb 1960-1989
# 3 23.73714 Mar 1960-1989
# 4 21.68584 Apr 1960-1989
# 5 19.53863 May 1960-1989
# 6 17.90322 Jun 1960-1989
# 7 17.97675 Jul 1960-1989
# 8 19.56051 Aug 1960-1989
# 9 20.90125 Sep 1960-1989
# 10 21.96886 Oct 1960-1989
# 11 22.86102 Nov 1960-1989
# 12 22.92537 Dec 1960-1989
The second part getting average for a particular range, you can try
from <- 1960
to <- 1964
subset_data <- subset(df, as.numeric(strftime(df$date, "%Y")) %in% from:to)
subset_mothly_data <- aggregate(subset_data$temp,by=list(strftime(subset_data$date, "%m")),mean)
cbind(Temp = subset_mothly_data[2], Month = month.abb , Period = paste(as.character(from), "-", as.character(to), sep = ""))
# x Month Period
# 1 23.45718 Jan 1960-1964
# 2 23.62400 Feb 1960-1964
# 3 23.51200 Mar 1960-1964
# 4 21.45365 Apr 1960-1964
# 5 18.35788 May 1960-1964
# 6 16.80729 Jun 1960-1964
# 7 17.25106 Jul 1960-1964
# 8 19.42329 Aug 1960-1964
# 9 21.80471 Sep 1960-1964
# 10 22.05247 Oct 1960-1964
# 11 22.61035 Nov 1960-1964
# 12 22.32094 Dec 1960-1964
I have shown this for the period 1960 - 1964 . Similarly, you could do this for any given period.
来源:https://stackoverflow.com/questions/33447901/create-monthly-mean-by-time-intervals