问题
I have a measurement (for instance solar radiation) indexed with a datetime variable, at an hourly timestamp. What I want to do is to sum the measurement value for each day of the year, and match this to another source of data also at daily scale (let's say mean outdoor temperature).
Although, the second source of data is already agregated from 8:00am to 8:00am the next day. I know how to summarise my first variable by standard day, but I need to do it from 8 to 8 in order to match both measurements.
An example of my data
set.seed(1L) # to create reproducible data
hourly = data.frame(datetime = seq(from = lubridate::ymd_hm("2017-01-01 01:00"),
length.out = 168, by = "hour"),
value = rpois(168, 10))
daily = data.frame(datetime = seq(from=as.Date("2017-01-01"), length.out = 31, by="day"),
value=rnorm(31))
回答1:
Expanding my comment into an answer, it's worth to note that the OP has emphasized the words aggregated from 8:00am to 8:00am the next day.
Mapping not aligned 24 hour periods to dates
If a 24 hour period is not aligned with midnight, i.e., does not extend from 00:00 to 24:00 but starts and ends sometime during the day, it is ambiguous which date is associated with that period.
We can take either
- the date of the day on which the period starts,
- the date of the day on which the period ends, or
- the date of the day which contains the majority of hours of the period.
Just to illustrate the difference:
# timestamps: 9 am, 10pm, 7 am next day
x <- lubridate::ymd_hm(c("2017-09-12 09:00", "2017-09-12 22:00", "2017-09-13 07:00"))
x
[1] "2017-09-12 09:00:00 UTC" "2017-09-12 22:00:00 UTC" "2017-09-13 07:00:00 UTC"
# map timestamps to date on which period starts by shifting back by 8 hours
x + lubridate::hours(-8L)
[1] "2017-09-12 01:00:00 UTC" "2017-09-12 14:00:00 UTC" "2017-09-12 23:00:00 UTC"
# map timestamps to date on which period ends by advancing by 16 hours
x + lubridate::hours(16L)
[1] "2017-09-13 01:00:00 UTC" "2017-09-13 14:00:00 UTC" "2017-09-13 23:00:00 UTC"
As there are no other information, let's assume that the daily data were mapped onto the day on which the period start.
Aggregating and merging
For grouping, aggregating, and merging data.table is used:
library(data.table)
# aggregate data by shifted timestamp
setDT(hourly)[, .(sum.value = sum(value)),
by = .(date = as.Date(datetime + lubridate::hours(-8L)))]
date sum.value 1: 2016-12-31 68 2: 2017-01-01 232 3: 2017-01-02 222 4: 2017-01-03 227 5: 2017-01-04 228 6: 2017-01-05 231 7: 2017-01-06 260 8: 2017-01-07 144
Note that the new date column which is used for grouping and aggregating is created on the fly in the by parameter (one of the reasons why I prefer data.table)
Now, the daily data need to be joined. By chaining this can be combined in one statement:
setDT(hourly)[, .(sum.value = sum(value)),
by = .(date = as.Date(datetime + lubridate::hours(-8L)))][
setDT(daily), on = .(date = datetime), nomatch = 0L]
date sum.value value 1: 2017-01-01 232 -0.5080862 2: 2017-01-02 222 0.5236206 3: 2017-01-03 227 1.0177542 4: 2017-01-04 228 -0.2511646 5: 2017-01-05 231 -1.4299934 6: 2017-01-06 260 1.7091210 7: 2017-01-07 144 1.4350696
The parameter nomatch = 0L indicate that we want an inner join here.
回答2:
You could do it using cut, e.g.:
library(lubridate)
library(dplyr)
brk = seq(ymd_hm(paste(as.Date(min(hourly$datetime) - days(1)), "08:00"), tz= "UTC"), ymd_hm(paste(as.Date(max(hourly$datetime)+ days(1)), "08:00"), tz= "UTC"), by = "24 hours")
hourly$cut <- ymd_hms(cut.POSIXt(hourly$datetime, breaks = brk))
hourly2 <- hourly %>% group_by(cut) %>% summarize(value = sum(value))
hourly2$cut <- as.Date(hourly2$cut)
names(hourly2) <- names(daily)
comb <- rbind(hourly2, daily) %>% group_by(datetime) %>% summarize(value = sum(value))
datetime value
<date> <dbl>
1 2016-12-31 52.0000000
2 2017-01-01 241.5612137
3 2017-01-02 244.3689032
4 2017-01-03 271.3156334
5 2017-01-04 253.8221333
6 2017-01-05 238.5790170
7 2017-01-06 220.7118064
8 2017-01-07 167.5018586
9 2017-01-08 -0.2962494
10 2017-01-09 0.4126310
... with 22 more rows
回答3:
using dplyr and just translating the day by substracting 8 hours :
hourly %>% mutate(datetime = as_date(datetime - 8 * hours())) %>%
rbind(daily) %>%
group_by(datetime) %>%
summarize_all(sum) %>%
ungroup%>%
arrange(datetime)
result
A tibble: 32 x 2
datetime value
<date> <dbl>
1 2016-12-31 70.0000000
2 2017-01-01 218.6726454
3 2017-01-02 244.3821258
4 2017-01-03 257.7136326
5 2017-01-04 220.4788443
6 2017-01-05 230.3729744
7 2017-01-06 248.5082639
8 2017-01-07 176.5511818
9 2017-01-08 -0.8307824
10 2017-01-09 -0.6343781
# ... with 22 more rows
来源:https://stackoverflow.com/questions/46164588/merge-two-daily-time-series-after-summarising-on-shifted-hours