R - Dplyr grouping with Daylight Saving Time

别等时光非礼了梦想. 提交于 2021-01-29 15:56:50

问题


I have a DataFrame f with data at a 10 mins time step like so:

DateTime           id     value             name
2015-01-01 00:00:00 40497   0                  HY
2015-01-01 00:00:00 51395   589                HY
2015-01-01 00:10:00 51395   583                HY
2015-01-01 00:10:00 40497   0                  HY
2015-01-01 00:20:00 51395   586                HY
2015-01-01 00:20:00 40497   0                  HY
2015-01-01 00:30:00 40497   0                  HY
2015-01-01 00:30:00 51395   586                HY
2015-01-01 00:40:00 40497   0                  HY

The columns id and name are not relevant to what I want to do. The type of the DataFrame is as follows:

'data.frame':   9510 obs. of  4 variables:
 $ DateTime        : POSIXct, format: "2019-10-27 00:00:00" "2019-10-27 00:10:00" "2019-10-27 00:20:00" ...
 $ id        : int  40497 40497 40497 40497 40497 40497 40497 40497 40497 40497 ...
 $ value        : int  1445 1444 1433 1431 1430 1431 1427 1411 1411 1410 ...
 $ name: chr  "HY" "HY" "HY" "HY" ...

I want to sum the values column by hour the data of the year 2019, past and future years are not important to me. This is at a first glance not that hard and there are a lot of answers to this question. One would do the following:

  f <- f %>%
    mutate(Year = year(DateTime)) %>%
    filter(Year == 2019) %>%
    mutate(day = floor_date(DateTime, 'day'), h = hour(DateTime)) %>%
    group_by(day, h) %>%
    mutate(sum_col = sum(value)) %>%
    distinct(Year, .keep_all = T) %>%
    ungroup()

The issue is that I have to consider daylight saving time, more specifically 27/10/2019 02:00:00. In my results DataFrame I need to have two rows for this value one that is the usual one and the other that is for Daylight Saving Time. The data already has "double values" for each of the 10 mins between 02:00 and 03:00" and it looks like this, but of course ith mutiple ids:

DateTime           id     value     name
2019-10-27 02:00:00 40497   1403    HY
2019-10-27 02:10:00 40497   1396    HY
2019-10-27 02:20:00 40497   1395    HY
2019-10-27 02:30:00 40497   1396    HY
2019-10-27 02:40:00 40497   1380    HY
2019-10-27 02:50:00 40497   1374    HY
2019-10-27 02:00:00 40497   1373    HY
2019-10-27 02:10:00 40497   1374    HY
2019-10-27 02:20:00 40497   1373    HY
2019-10-27 02:30:00 40497   1373    HY
2019-10-27 02:40:00 40497   1373    HY
2019-10-27 02:50:00 40497   1373    HY
2019-10-27 03:00:00 40497   1367    HY

My question is how could I group by hour, regardeless of name and id and sum the values column and have 2 rows of 2019-10-27 02:00:00, the first for the "real one" and the other for daylight savings.

来源:https://stackoverflow.com/questions/61551392/r-dplyr-grouping-with-daylight-saving-time

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!