问题
I have 'mydata_hourly' with 3 station (actually more) and their hourly temperature values over one year. This gives me 8760 hourly measurements in one year. Now I want to have the same structure but with the (365) 24h-average means 'mydata_daily'.
I have tried something with a for loop, this didn't work out. I have heard something about an aggregate function. I found something with a timestamp, what I don't have unfortunately.
.
my_data_hourly <- structure(c(8.29, 7.96, 8.14, 7.27, 7.37, 7.3, 7.23, 7.53,
7.98, 10.2, 12.39, 14.34, 14.87, 14.39, 12.54, 11.84, 10.3, 10.62,
10.65, 10.56, 10.43, 10.35, 9.85, 9.12, 8.95, 8.82, 8.92, 9.33,
9.44, 9.3, 9.15, 9.37, 9.54, 10.24, 12.13, 12.43, 12.65, 13,
13.18, 13.58, 13.64, 13.75, 13.85, 13.94, 13.79, 13.84, 13.94,
14.26, 24.93, 24.64, 23.67, 21.46, 21.33, 20.83, 21.12, 21.1,
23.75, 25.39, 30.72, 30.71, 30.81, 30.92, 32.61, 32.37, 32.49,
30.68, 30.23, 30.45, 28.1, 26.9, 25.09, 25.07, 24.59, 24.22,
23.05, 22.21, 22.07, 21.6, 21.24, 21.22, 21.85, 24.87, 28.85,
29.42, 30.82, 30.97, 31.32, 30.81, 30.83, 29.9, 30.01, 30.31,
30, 27.91, 25.78, 25.88, 8.78, 8.47, 8.49, 7.65, 8.63, 9.02,
9.02, 8.11, 7.63, 9.19, 11.25, 12.24, 13.62, 12.09, 10.6, 11.1,
10.16, 10.44, 9.58, 10.04, 10.01, 10.23, 9.51, 9.2, 9.34, 9.6,
9.4, 9.45, 9.36, 9.26, 9.3, 9.46, 9.58, 9.89, 10.6, 11.04, 12.1,
12.61, 13.12, 13.47, 13.55, 13.51, 13.63, 13.84, 13.93, 14.17,
13.97, 13.86), .Dim = c(48L, 3L), .Dimnames = list(NULL, c("station1",
"station2", "station3")))
.
hourly_measure Station1 Station2 Station3
[1,] 8.29 24.93 8.78
[2,] 7.96 24.64 8.47
[3,] 8.14 23.67 8.49
[4,] 7.27 21.46 7.65
[5,] 7.37 21.33 8.63
[6,] 7.30 20.83 9.02
[7,] 7.23 21.12 9.02
[8,] 7.53 21.10 8.11
[9,] 7.98 23.75 7.63
[10,] 10.20 25.39 9.19
[11,] 12.39 30.72 11.25
[12,] 14.34 30.71 12.24
[13,] 14.87 30.81 13.62
[14,] 14.39 30.92 12.09
[15,] 12.54 32.61 10.60
[16,] 11.84 32.37 11.10
[17,] 10.30 32.49 10.16
[18,] 10.62 30.68 10.44
[19,] 10.65 30.23 9.58
[20,] 10.56 30.45 10.04
[21,] 10.43 28.10 10.01
[22,] 10.35 26.90 10.23
[23,] 9.85 25.09 9.51
[24,] 9.12 25.07 9.20
[25,] 8.95 24.59 9.34
[26,] 8.82 24.22 9.60
[27,] 8.92 23.05 9.40
[28,] 9.33 22.21 9.45
[29,] 9.44 22.07 9.36
[30,] 9.30 21.60 9.26
[31,] 9.15 21.24 9.30
[32,] 9.37 21.22 9.46
[33,] 9.54 21.85 9.58
[34,] 10.24 24.87 9.89
[35,] 12.13 28.85 10.60
[36,] 12.43 29.42 11.04
[37,] 12.65 30.82 12.10
[38,] 13.00 30.97 12.61
[39,] 13.18 31.32 13.12
[40,] 13.58 30.81 13.47
[41,] 13.64 30.83 13.55
[42,] 13.75 29.90 13.51
[43,] 13.85 30.01 13.63
[44,] 13.94 30.31 13.84
[45,] 13.79 30.00 13.93
[46,] 13.84 27.91 14.17
[47,] 13.94 25.78 13.97
[48,] 14.26 25.88 13.86
So in theory I want to have mydata_hourly[1:24,1] in my_data_daily[1,1] and mydata_hourly[25:48,1] in mydata_daily[2,1]
回答1:
These are time series and it is probably best to use time series representations for them which will facilitate plotting and other time series processing.
I) ts Suppose your data is the matrix m
shown reproducibly in the Note at the end. Convert that to a ts
time series with frequency 24 and then aggregate it as shown. No packages are used.
tt <- ts(m, frequency = 24)
aggregate(tt, 1, mean)
giving:
Time Series:
Start = 1
End = 2
Frequency = 1
Station1 Station2 Station3
1 10.06333 26.89042 9.794167
2 11.71000 25.40542 11.585000
2) zooreg An alternative is to create zooreg objects using the zoo package.
library(zoo)
z <- zooreg(m, frequency = 24)
aggregate(z, as.integer, mean)
giving:
Station1 Station2 Station3
1 10.06333 26.89042 9.794167
2 11.71000 25.40542 11.585000
Note
Lines <- "
Station1 Station2 Station3
8.29 24.93 8.78
7.96 24.64 8.47
8.14 23.67 8.49
7.27 21.46 7.65
7.37 21.33 8.63
7.30 20.83 9.02
7.23 21.12 9.02
7.53 21.10 8.11
7.98 23.75 7.63
10.20 25.39 9.19
12.39 30.72 11.25
14.34 30.71 12.24
14.87 30.81 13.62
14.39 30.92 12.09
12.54 32.61 10.60
11.84 32.37 11.10
10.30 32.49 10.16
10.62 30.68 10.44
10.65 30.23 9.58
10.56 30.45 10.04
10.43 28.10 10.01
10.35 26.90 10.23
9.85 25.09 9.51
9.12 25.07 9.20
8.95 24.59 9.34
8.82 24.22 9.60
8.92 23.05 9.40
9.33 22.21 9.45
9.44 22.07 9.36
9.30 21.60 9.26
9.15 21.24 9.30
9.37 21.22 9.46
9.54 21.85 9.58
10.24 24.87 9.89
12.13 28.85 10.60
12.43 29.42 11.04
12.65 30.82 12.10
13.00 0.97 12.61
13.18 31.32 13.12
13.58 30.81 13.47
13.64 30.83 13.55
13.75 29.90 13.51
13.85 30.01 13.63
13.94 30.31 13.84
13.79 30.00 13.93
13.84 27.91 14.17
13.94 25.78 13.97
14.26 25.88 13.86"
m <- as.matrix(read.table(text = Lines, header = TRUE))
回答2:
One dplyr
possibility could be:
df %>%
group_by(Period = gl(n()/24, 24)) %>%
summarise_at(-1, mean)
Period Station1 Station2 Station3
<fct> <dbl> <dbl> <dbl>
1 1 10.1 26.9 9.79
2 2 11.7 25.4 11.6
来源:https://stackoverflow.com/questions/56476532/how-to-aggregate-hourly-values-into-24h-average-means-without-timestamp