Create monthly mean by time intervals

时光怂恿深爱的人放手 提交于 2019-12-07 14:06:22

问题


Sorry if this has already been posted but I looked really hard and could not find anything.

I am working with monthly temperature observations for 30 years, comprising January 1960 to December 1989. It looks like this:

> head(df)
        date     temp
1 1960-01-01 22.92235
2 1960-02-01 23.07059
3 1960-03-01 23.10941
4 1960-04-01 20.78353
5 1960-05-01 17.45176
6 1960-06-01 17.31765

First, what I need to do is to average all januaries, februaries, marches and etc for the whole period.

Then, I would like to do the same for specific periods of time (3 years, 5 years, 10 years etc).

For example,

  • The average of all jan, feb, mar etc between 1960 and 1964;
  • The average of all jan, feb, mar etc between 1965 and 1969;
  • and so on.

The final result would consist of month, period and temperature, something like this:

Month    Period Temp
  Jan 1960-1989  17
  Feb 1960-1989  12
  Mar 1960-1989   7
  Apr 1960-1989   9
  May 1960-1989  15
  Jun 1960-1989  12
  Jul 1960-1989  17
  Aug 1960-1989  22
  Sep 1960-1989  21
  Oct 1960-1989  21
  Nov 1960-1989  18
  Dec 1960-1989  17
  Jan 1960-1964  17
  Feb 1960-1964  12
  Mar 1960-1964   7
  Apr 1960-1964   9
  May 1960-1964   9
  Jun 1960-1964  11
  Jul 1960-1964  14
  Aug 1960-1964  18
  Sep 1960-1964  13
  Oct 1960-1964  12
  Nov 1960-1964  17
  Dec 1960-1964  11

Any ideias on how to do this?

In case you find useful, here is one clone of my dataset:

df <- structure(list(date = structure(c(-3653, -3622, -3593, -3562, 
-3532, -3501, -3471, -3440, -3409, -3379, -3348, -3318, -3287, 
-3256, -3228, -3197, -3167, -3136, -3106, -3075, -3044, -3014, 
-2983, -2953, -2922, -2891, -2863, -2832, -2802, -2771, -2741, 
-2710, -2679, -2649, -2618, -2588, -2557, -2526, -2498, -2467, 
-2437, -2406, -2376, -2345, -2314, -2284, -2253, -2223, -2192, 
-2161, -2132, -2101, -2071, -2040, -2010, -1979, -1948, -1918, 
-1887, -1857, -1826, -1795, -1767, -1736, -1706, -1675, -1645, 
-1614, -1583, -1553, -1522, -1492, -1461, -1430, -1402, -1371, 
-1341, -1310, -1280, -1249, -1218, -1188, -1157, -1127, -1096, 
-1065, -1037, -1006, -976, -945, -915, -884, -853, -823, -792, 
-762, -731, -700, -671, -640, -610, -579, -549, -518, -487, -457, 
-426, -396, -365, -334, -306, -275, -245, -214, -184, -153, -122, 
-92, -61, -31, 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 
334, 365, 396, 424, 455, 485, 516, 546, 577, 608, 638, 669, 699, 
730, 761, 790, 821, 851, 882, 912, 943, 974, 1004, 1035, 1065, 
1096, 1127, 1155, 1186, 1216, 1247, 1277, 1308, 1339, 1369, 1400, 
1430, 1461, 1492, 1520, 1551, 1581, 1612, 1642, 1673, 1704, 1734, 
1765, 1795, 1826, 1857, 1885, 1916, 1946, 1977, 2007, 2038, 2069, 
2099, 2130, 2160, 2191, 2222, 2251, 2282, 2312, 2343, 2373, 2404, 
2435, 2465, 2496, 2526, 2557, 2588, 2616, 2647, 2677, 2708, 2738, 
2769, 2800, 2830, 2861, 2891, 2922, 2953, 2981, 3012, 3042, 3073, 
3103, 3134, 3165, 3195, 3226, 3256, 3287, 3318, 3346, 3377, 3407, 
3438, 3468, 3499, 3530, 3560, 3591, 3621, 3652, 3683, 3712, 3743, 
3773, 3804, 3834, 3865, 3896, 3926, 3957, 3987, 4018, 4049, 4077, 
4108, 4138, 4169, 4199, 4230, 4261, 4291, 4322, 4352, 4383, 4414, 
4442, 4473, 4503, 4534, 4564, 4595, 4626, 4656, 4687, 4717, 4748, 
4779, 4807, 4838, 4868, 4899, 4929, 4960, 4991, 5021, 5052, 5082, 
5113, 5144, 5173, 5204, 5234, 5265, 5295, 5326, 5357, 5387, 5418, 
5448, 5479, 5510, 5538, 5569, 5599, 5630, 5660, 5691, 5722, 5752, 
5783, 5813, 5844, 5875, 5903, 5934, 5964, 5995, 6025, 6056, 6087, 
6117, 6148, 6178, 6209, 6240, 6268, 6299, 6329, 6360, 6390, 6421, 
6452, 6482, 6513, 6543, 6574, 6605, 6634, 6665, 6695, 6726, 6756, 
6787, 6818, 6848, 6879, 6909, 6940, 6971, 6999, 7030, 7060, 7091, 
7121, 7152, 7183, 7213, 7244, 7274), class = "Date"), temp = c(22.9223529411765, 
23.0705882352941, 23.1094117647059, 20.7835294117647, 17.4517647058824, 
17.3176470588235, 18.0494117647059, 19.6188235294118, 21.3023529411765, 
23.1105882352941, 22.2364705882353, 22.7482352941176, 23.5870588235294, 
24.0023529411765, 23.0094117647059, 22.0176470588235, 19.4917647058824, 
18.1011764705882, 18.3164705882353, 20.0623529411765, 22.8717647058824, 
23.2576470588235, 23.68, 22.3694117647059, 22.9517647058824, 
23.6976470588235, 23.3294117647059, 20.8564705882353, 18.16, 
15.8988235294118, 15.7988235294118, 18.4176470588235, 20.8423529411765, 
20.3247058823529, 22.3070588235294, 22.2035294117647, 24.2235294117647, 
23.6976470588235, 24.4082352941176, 21.1752941176471, 18.1023529411765, 
16.1211764705882, 18.3164705882353, 19.7635294117647, 23.1294117647059, 
22.9964705882353, 23.6552941176471, 22.6964705882353, 23.6011764705882, 
23.6517647058824, 23.7035294117647, 22.4352941176471, 18.5835294117647, 
16.5976470588235, 15.7741176470588, 19.2541176470588, 20.8776470588235, 
20.5729411764706, 21.1729411764706, 21.5870588235294, 22.4576470588235, 
23.6058823529412, 21.84, 21.6694117647059, 19.2458823529412, 
18.7517647058824, 17.7811764705882, 19.4764705882353, 21.9270588235294, 
21.5470588235294, 22.88, 23.2458823529412, 24.2776470588235, 
25.2470588235294, 23.4694117647059, 21.4435294117647, 19.3941176470588, 
18.5447058823529, 17.6, 18.3764705882353, 19.8529411764706, 22.0823529411765, 
22.7294117647059, 23.4011764705882, 23.3611764705882, 24.2505882352941, 
23.2870588235294, 21.9482352941176, 20.5552941176471, 18.0788235294118, 
18.5929411764706, 20.8752941176471, 21.9023529411765, 23.6105882352941, 
22.4070588235294, 21.5635294117647, 23.3129411764706, 22.9741176470588, 
23.3670588235294, 19.6105882352941, 16.9941176470588, 17.7670588235294, 
17.4858823529412, 17.8517647058824, 20.26, 22.1576470588235, 
23.8364705882353, 23.4447058823529, 24.8129411764706, 25.1764705882353, 
24.2694117647059, 21.5035294117647, 20.0458823529412, 18.4694117647059, 
18.4541176470588, 19.5388235294118, 22.02, 20.5364705882353, 
22.9858823529412, 21.9752941176471, 23.7729411764706, 24.0576470588235, 
24.0941176470588, 22.1552941176471, 21.2329411764706, 19.5611764705882, 
17.8788235294118, 18.6823529411765, 20.1541176470588, 21.6258823529412, 
21.5211764705882, 23.9811764705882, 24.8352941176471, 24.5882352941176, 
24.1729411764706, 21.1035294117647, 19.0435294117647, 17.08, 
17.4529411764706, 19.1458823529412, 20.4447058823529, 20.7129411764706, 
21.5047058823529, 22.6952941176471, 23.4364705882353, 23.1, 24.1847058823529, 
19.8105882352941, 19.9847058823529, 20.5188235294118, 17.7658823529412, 
19.4435294117647, 20.7588235294118, 21.7835294117647, 22.7788235294118, 
23.2388235294118, 24.9129411764706, 25.6, 23.5647058823529, 24.0058823529412, 
19.7823529411765, 19.3152941176471, 18.7741176470588, 19.0305882352941, 
20.5576470588235, 21.3611764705882, 21.4247058823529, 23.4811764705882, 
23.6505882352941, 25.1870588235294, 23.3541176470588, 21.4823529411765, 
18.7364705882353, 17.7235294117647, 18.3976470588235, 19.7235294117647, 
21.0741176470588, 21.6094117647059, 22.9635294117647, 22.4011764705882, 
23.4152941176471, 24.7741176470588, 24.3270588235294, 20.7976470588235, 
18.8764705882353, 17.7788235294118, 16.4129411764706, 21.4117647058824, 
22.3317647058824, 21.66, 22.3694117647059, 23.0917647058824, 
24.4541176470588, 23.2847058823529, 23.3164705882353, 21.2529411764706, 
19.1258823529412, 17.3882352941176, 17.3823529411765, 19.0529411764706, 
19.6576470588235, 20.2976470588235, 21.9023529411765, 23.3094117647059, 
24.0117647058824, 25.5611764705882, 24.9129411764706, 21.3964705882353, 
19.9870588235294, 18.3929411764706, 20.9917647058824, 20.3058823529412, 
21.4435294117647, 23.1941176470588, 22.8388235294118, 22.5176470588235, 
24.6317647058824, 24.6541176470588, 24.2, 20.84, 18.4576470588235, 
17.5011764705882, 19.16, 20.54, 20.1517647058824, 22.6776470588235, 
22.7470588235294, 22.7882352941176, 22.0811764705882, 24.2152941176471, 
22.9235294117647, 20.8411764705882, 19.6188235294118, 17.16, 
16.0529411764706, 20.3223529411765, 19.9752941176471, 22.5152941176471, 
22.2705882352941, 23.1541176470588, 23.1047058823529, 23.9517647058824, 
24.8176470588235, 22.18, 20.5023529411765, 17.3505882352941, 
19.1917647058824, 19.9894117647059, 19.0235294117647, 22.8235294117647, 
22.7094117647059, 23.8741176470588, 24.0517647058824, 25.1764705882353, 
23.9235294117647, 21.2929411764706, 20.6117647058824, 17.1305882352941, 
16.3470588235294, 19.6470588235294, 21.3341176470588, 20.2176470588235, 
23.7435294117647, 22.6741176470588, 22.9070588235294, 24.7152941176471, 
23.2905882352941, 20.5776470588235, 18.9635294117647, 19.0658823529412, 
18.8423529411765, 20.0729411764706, 21.3047058823529, 22.1588235294118, 
24.0388235294118, 22.1917647058824, 24.0517647058824, 24.8729411764706, 
23.0117647058824, 23, 21.3094117647059, 19.4105882352941, 20.3470588235294, 
19.4482352941176, 20.0670588235294, 21.6364705882353, 23.4211764705882, 
23.16, 25.4788235294118, 26.4741176470588, 24.0482352941176, 
21.4176470588235, 21.7164705882353, 19.0905882352941, 19.6752941176471, 
18.1611764705882, 20.0482352941176, 23.4917647058824, 23.4894117647059, 
22.5482352941176, 23.1376470588235, 24.9811764705882, 24.1552941176471, 
22.8423529411765, 19.7435294117647, 16.4, 17.3105882352941, 20.5235294117647, 
21.0494117647059, 23.1352941176471, 23.9435294117647, 23.9058823529412, 
24.9835294117647, 24.6952941176471, 24.0047058823529, 23.3164705882353, 
21.5823529411765, 18.3447058823529, 18.1964705882353, 20.0035294117647, 
20.7152941176471, 22.5705882352941, 24.6541176470588, 23.2329411764706, 
25.0517647058824, 24.3329411764706, 23.5811764705882, 22.9988235294118, 
19.4976470588235, 17.3188235294118, 19.5635294117647, 19.0211764705882, 
19.7223529411765, 22.6858823529412, 23.9423529411765, 23.6905882352941, 
25.7129411764706, 23.9505882352941, 24.4376470588235, 22.6070588235294, 
19.8882352941176, 17.2058823529412, 16.4211764705882, 20.02, 
21.9458823529412, 21.9341176470588, 22.74, 23.8, 23.9611764705882, 
24.4564705882353, 24, 23.2129411764706, 19.4729411764706, 17.7105882352941, 
16.9682352941176, 19.0341176470588, 20.2917647058824, 20.7776470588235, 
22.9364705882353, 22.7894117647059)), .Names = c("date", "temp"
), row.names = c(NA, -360L), class = "data.frame")

回答1:


One option would be to use data.table with year grouping by cut or findInterval. For the first case, ie. getting mean of each month aggregating over all the years, we convert the 'date' to Date class and extract the months, use it as grouping variable and get the mean of 'temp'. Note that we convert the 'data.frame' to 'data.table' first (setDT(df)).

library(data.table)
setDT(df)[, list(Temp=mean(temp)) , by = .(Months= months(as.Date(date), abbr=TRUE))]
#    Months     Temp
# 1:    Jan 23.90506
# 2:    Feb 24.40012
# 3:    Mar 23.73714
# 4:    Apr 21.68584
# 5:    May 19.53863
# 6:    Jun 17.90322
# 7:    Jul 17.97675
# 8:    Aug 19.56051
# 9:    Sep 20.90125
#10:    Oct 21.96886
#11:    Nov 22.86102
#12:    Dec 22.92537

For periodwise and monthly grouping, we need to create a period column. One way would be either cut or findInterval. For example, if we are looking for a 5 year window, ie. 1960-1964, 1965-1969, etc., we create the 'Period' column by creating the vec in findInterval using seq, change the numeric index derived from findInterval to 'lbl' created from paste. Use the 'Month' and 'Period' as grouping variable and the rest is same as before.

setDT(df)[, c('Month', 'Period') := {tmp <- as.Date(date)
         tmp1 <- as.numeric(format(tmp, '%Y'))
         tmp2 <- months(tmp, abbr=TRUE)
         i1 <- seq(min(tmp1), max(tmp1)+4, by=5)
         i2 <- i1+4
         lbl <-paste(i1, i2, sep='-')         
         list(tmp2, lbl[findInterval(tmp1, i1)])
         }]
df[, list(Temp= mean(temp)), .(Month, Period)]
#     Month    Period     Temp
# 1:   Jan 1960-1964 23.45718
# 2:   Feb 1960-1964 23.62400
# 3:   Mar 1960-1964 23.51200
# 4:   Apr 1960-1964 21.45365
# 5:   May 1960-1964 18.35788
# 6:   Jun 1960-1964 16.80729
# 7:   Jul 1960-1964 17.25106
# 8:   Aug 1960-1964 19.42329
# 9:   Sep 1960-1964 21.80471
#10:   Oct 1960-1964 22.05247
#11:   Nov 1960-1964 22.61035
#12:   Dec 1960-1964 22.32094
#13:   Jan 1965-1969 23.64447
#14:   Feb 1965-1969 24.25082
#15:   Mar 1965-1969 23.24659
#16:   Apr 1965-1969 21.23506
#17:   May 1965-1969 19.24706
#18:   Jun 1965-1969 18.32235
#19:   Jul 1965-1969 17.98282
#20:   Aug 1965-1969 19.22376
#21:   Sep 1965-1969 21.19247
#22:   Oct 1965-1969 21.98682
#23:   Nov 1965-1969 22.96776
#24:   Dec 1965-1969 22.72612
#25:   Jan 1970-1974 24.12165
#26:   Feb 1970-1974 24.50659
#27:   Mar 1970-1974 23.87412
#28:   Apr 1970-1974 21.71153
#29:   May 1970-1974 19.75600
#30:   Jun 1970-1974 18.83976
#31:   Jul 1970-1974 18.05388
#32:   Aug 1970-1974 19.20518
#33:   Sep 1970-1974 20.59788
#34:   Oct 1970-1974 21.41859
#35:   Nov 1970-1974 22.03859
#36:   Dec 1970-1974 23.15953
#37:   Jan 1975-1979 23.71882
#38:   Feb 1975-1979 24.49788
#39:   Mar 1975-1979 23.93600
#40:   Apr 1975-1979 21.02565
#41:   May 1975-1979 19.21318
#42:   Jun 1975-1979 17.64424
#43:   Jul 1975-1979 18.00000
#44:   Aug 1975-1979 20.32659
#45:   Sep 1975-1979 20.71200
#46:   Oct 1975-1979 22.06894
#47:   Nov 1975-1979 22.42565
#48:   Dec 1975-1979 22.97224
#49:   Jan 1980-1984 23.91882
#50:   Feb 1980-1984 25.03812
#51:   Mar 1980-1984 23.81835
#52:   Apr 1980-1984 21.69365
#53:   May 1980-1984 20.62071
#54:   Jun 1980-1984 18.40965
#55:   Jul 1980-1984 18.88071
#56:   Aug 1980-1984 19.46376
#57:   Sep 1980-1984 20.35553
#58:   Oct 1980-1984 22.06565
#59:   Nov 1980-1984 23.48047
#60:   Dec 1980-1984 22.88965
#61:   Jan 1985-1989 24.56941
#62:   Feb 1985-1989 24.48329
#63:   Mar 1985-1989 24.03576
#64:   Apr 1985-1989 22.99553
#65:   May 1985-1989 20.03694
#66:   Jun 1985-1989 17.39600
#67:   Jul 1985-1989 17.69200
#68:   Aug 1985-1989 19.72047
#69:   Sep 1985-1989 20.74494
#70:   Oct 1985-1989 22.22071
#71:   Nov 1985-1989 23.64329
#72:   Dec 1985-1989 23.48376
#    Month    Period     Temp

In the same way, we can get 10 year or other windows.




回答2:


The first part of question Average all januaries, februaries, you can get by -

monthly_data <- aggregate(df$temp,by=list(strftime(df$date, "%m")),mean) 
cbind(monthly_data[2], Month = month.abb, Period = "1960-1989")

#         x   Month    Period
# 1  23.90506   Jan 1960-1989
# 2  24.40012   Feb 1960-1989
# 3  23.73714   Mar 1960-1989
# 4  21.68584   Apr 1960-1989
# 5  19.53863   May 1960-1989
# 6  17.90322   Jun 1960-1989
# 7  17.97675   Jul 1960-1989
# 8  19.56051   Aug 1960-1989
# 9  20.90125   Sep 1960-1989
# 10 21.96886   Oct 1960-1989
# 11 22.86102   Nov 1960-1989
# 12 22.92537   Dec 1960-1989

The second part getting average for a particular range, you can try

from <- 1960
to <- 1964
subset_data <- subset(df, as.numeric(strftime(df$date, "%Y")) %in% from:to)
subset_mothly_data <- aggregate(subset_data$temp,by=list(strftime(subset_data$date, "%m")),mean) 
cbind(Temp = subset_mothly_data[2], Month = month.abb , Period = paste(as.character(from), "-", as.character(to), sep = ""))

# x    Month    Period
# 1  23.45718   Jan 1960-1964
# 2  23.62400   Feb 1960-1964
# 3  23.51200   Mar 1960-1964
# 4  21.45365   Apr 1960-1964
# 5  18.35788   May 1960-1964
# 6  16.80729   Jun 1960-1964
# 7  17.25106   Jul 1960-1964
# 8  19.42329   Aug 1960-1964
# 9  21.80471   Sep 1960-1964
# 10 22.05247   Oct 1960-1964
# 11 22.61035   Nov 1960-1964
# 12 22.32094   Dec 1960-1964

I have shown this for the period 1960 - 1964 . Similarly, you could do this for any given period.



来源:https://stackoverflow.com/questions/33447901/create-monthly-mean-by-time-intervals

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!