merge two daily time series after summarising on shifted hours

拜拜、爱过 提交于 2019-12-12 04:42:58

问题


I have a measurement (for instance solar radiation) indexed with a datetime variable, at an hourly timestamp. What I want to do is to sum the measurement value for each day of the year, and match this to another source of data also at daily scale (let's say mean outdoor temperature).

Although, the second source of data is already agregated from 8:00am to 8:00am the next day. I know how to summarise my first variable by standard day, but I need to do it from 8 to 8 in order to match both measurements.

An example of my data

set.seed(1L) # to create reproducible data
hourly = data.frame(datetime = seq(from = lubridate::ymd_hm("2017-01-01 01:00"), 
                                   length.out = 168, by = "hour"),
                    value = rpois(168, 10))
daily = data.frame(datetime = seq(from=as.Date("2017-01-01"), length.out = 31, by="day"),
                   value=rnorm(31))

回答1:


Expanding my comment into an answer, it's worth to note that the OP has emphasized the words aggregated from 8:00am to 8:00am the next day.

Mapping not aligned 24 hour periods to dates

If a 24 hour period is not aligned with midnight, i.e., does not extend from 00:00 to 24:00 but starts and ends sometime during the day, it is ambiguous which date is associated with that period.

We can take either

  1. the date of the day on which the period starts,
  2. the date of the day on which the period ends, or
  3. the date of the day which contains the majority of hours of the period.

Just to illustrate the difference:

# timestamps: 9 am, 10pm, 7 am next day 
x <- lubridate::ymd_hm(c("2017-09-12 09:00", "2017-09-12 22:00", "2017-09-13 07:00"))
x
[1] "2017-09-12 09:00:00 UTC" "2017-09-12 22:00:00 UTC" "2017-09-13 07:00:00 UTC"
# map timestamps to date on which period starts by shifting back by 8 hours
x + lubridate::hours(-8L)
[1] "2017-09-12 01:00:00 UTC" "2017-09-12 14:00:00 UTC" "2017-09-12 23:00:00 UTC"
# map timestamps to date on which period ends by advancing by 16 hours
x + lubridate::hours(16L)
[1] "2017-09-13 01:00:00 UTC" "2017-09-13 14:00:00 UTC" "2017-09-13 23:00:00 UTC"

As there are no other information, let's assume that the daily data were mapped onto the day on which the period start.

Aggregating and merging

For grouping, aggregating, and merging data.table is used:

library(data.table)
# aggregate data by shifted timestamp
setDT(hourly)[, .(sum.value = sum(value)), 
              by = .(date = as.Date(datetime + lubridate::hours(-8L)))]
         date sum.value
1: 2016-12-31        68
2: 2017-01-01       232
3: 2017-01-02       222
4: 2017-01-03       227
5: 2017-01-04       228
6: 2017-01-05       231
7: 2017-01-06       260
8: 2017-01-07       144

Note that the new date column which is used for grouping and aggregating is created on the fly in the by parameter (one of the reasons why I prefer data.table)

Now, the daily data need to be joined. By chaining this can be combined in one statement:

setDT(hourly)[, .(sum.value = sum(value)), 
              by = .(date = as.Date(datetime + lubridate::hours(-8L)))][
                setDT(daily), on = .(date = datetime), nomatch = 0L]
         date sum.value      value
1: 2017-01-01       232 -0.5080862
2: 2017-01-02       222  0.5236206
3: 2017-01-03       227  1.0177542
4: 2017-01-04       228 -0.2511646
5: 2017-01-05       231 -1.4299934
6: 2017-01-06       260  1.7091210
7: 2017-01-07       144  1.4350696

The parameter nomatch = 0L indicate that we want an inner join here.




回答2:


You could do it using cut, e.g.:

library(lubridate)
library(dplyr)
brk = seq(ymd_hm(paste(as.Date(min(hourly$datetime) - days(1)), "08:00"), tz= "UTC"), ymd_hm(paste(as.Date(max(hourly$datetime)+ days(1)), "08:00"), tz= "UTC"), by = "24 hours")
hourly$cut <- ymd_hms(cut.POSIXt(hourly$datetime, breaks = brk))
hourly2 <- hourly %>% group_by(cut) %>% summarize(value = sum(value)) 
hourly2$cut <- as.Date(hourly2$cut)
names(hourly2) <- names(daily)
comb <- rbind(hourly2, daily) %>% group_by(datetime) %>% summarize(value = sum(value))

     datetime       value
       <date>       <dbl>
 1 2016-12-31  52.0000000
 2 2017-01-01 241.5612137
 3 2017-01-02 244.3689032
 4 2017-01-03 271.3156334
 5 2017-01-04 253.8221333
 6 2017-01-05 238.5790170
 7 2017-01-06 220.7118064
 8 2017-01-07 167.5018586
 9 2017-01-08  -0.2962494
10 2017-01-09   0.4126310
 ... with 22 more rows



回答3:


using dplyr and just translating the day by substracting 8 hours :

hourly %>% mutate(datetime = as_date(datetime - 8 * hours())) %>%
  rbind(daily) %>%
  group_by(datetime)  %>%
  summarize_all(sum) %>%
  ungroup%>%
  arrange(datetime) 

result

 A tibble: 32 x 2
     datetime       value
       <date>       <dbl>
 1 2016-12-31  70.0000000
 2 2017-01-01 218.6726454
 3 2017-01-02 244.3821258
 4 2017-01-03 257.7136326
 5 2017-01-04 220.4788443
 6 2017-01-05 230.3729744
 7 2017-01-06 248.5082639
 8 2017-01-07 176.5511818
 9 2017-01-08  -0.8307824
10 2017-01-09  -0.6343781
# ... with 22 more rows


来源:https://stackoverflow.com/questions/46164588/merge-two-daily-time-series-after-summarising-on-shifted-hours

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!