问题
I am looking at a stackoverflow post over here: R: Count Number of Observations within a group
Here, daily data is created and summed/grouped at monthly intervals (as well as weekly intervals):
library(xts)
library(dplyr)
#create data
date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")
property_damages_in_dollars <- rnorm(731,100,10)
final_data <- data.frame(date_decision_made, property_damages_in_dollars)
# weekly
weekly = final_data %>%
mutate(date_decision_made = as.Date(date_decision_made)) %>%
group_by(week = format(date_decision_made, "%W-%y")) %>%
summarise( total = sum(property_damages_in_dollars, na.rm = TRUE), Count = n())
# monthly
final_data %>%
mutate(date_decision_made = as.Date(date_decision_made)) %>%
group_by(week = format(date_decision_made, "%Y-%m")) %>%
summarise( total = sum(property_damages_in_dollars, na.rm = TRUE), Count = n())
It seems that the "format" statement in R (https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/format) is being used to instruct the computer to "group and sum" the data some fixed interval.
My question: is there a way to "instruct" the computer to "group and sum" by irregular intervals? E.g. by 11 day periods, by 3 month periods, by 2 year periods? (I guess 3 months can be written as 90 days...2 years can be written as 730 days).
Is this possible?
Thanks
回答1:
You can use lubridate's ceiling_date
/floor_date
to create groups at irregular intervals.
library(dplyr)
library(lubridate)
final_data %>%
mutate(date_decision_made = as.Date(date_decision_made)) %>%
group_by(group = ceiling_date(date_decision_made, '11 days')) %>%
summarise(amount = sum(property_damages_in_dollars))
You can also specify intervals like ceiling_date(date_decision_made, '3 years')
or ceiling_date(date_decision_made, '2 months')
.
回答2:
Using data.table
library(data.table)
library(lubridate)
setDT(final_data)[, .(amount = sum(property_damages_in_dollars)),
,.(group = ceiling_date(as.IDate(date_decison_made), "11 days"))]
来源:https://stackoverflow.com/questions/65367282/grouping-and-summing-data-by-irregular-time-intervals-r-language