问题
I am stuck in R. I need to aggregate every minute data into every 30 minutes. My data frame is like this:
Date Time Power
2008-10-11 23:57:00 0.376
2008-10-11 23:58:00 0.374
2008-10-11 23:59:00 0.374
2008-10-12 0:00:00 0.334
2008-10-12 0:01:00 0.330
...
2008-12-13 19:24:00 1.390
2008-12-13 19:25:00 1.370
2008-12-13 19:26:00 1.368
2008-12-13 19:27:00 1.362
2008-12-13 19:28:00 1.352
2008-12-13 19:29:00 1.360
...
Basically, I have 500,979 rows of data for 2008. Every minute gives a power value. In order to do Time Series, I need to aggregate my data per 30 minutes. That would give me around 17,520 rows in the end. My questions are:
Is it possible to plot time series when you have some missing values for the Power column?
How to aggregate the Power value by 30 minutes please? (e.g., 0:00:00 to 0:29:00; 0:30:00 to 0:59:00)
Thank you very much!
回答1:
With magrittr
and dplyr
, you can generate a new column to hold a grouping variable giving with time unit the measurement is in, then summarise
to calculate whatever aggregate metric you want. (Note %<>%
saves the results of the pipes back into the original variable).
I am not sure what format your time is in already, when I read it in it is just a character, so converting it to numeric hours takes a bit of doing. If it is already in some other format for you, that step would likely be easier.
df %<>%
mutate(cleanTime =
strsplit(Time, ":") %>%
sapply(function(x){
x <- as.numeric(x)
x[1] + x[2]/60 + x[3]/(60*60)
})
, roundTime = floor(cleanTime * 2)/2
)
Then, group by the generated variable, and take whatever aggregate stats you want:
df %>%
group_by(Date, roundTime) %>%
summarise(avgPower = mean(Power)
, sumPower = sum(Power))
gives:
Date roundTime avgPower sumPower
<date> <dbl> <dbl> <dbl>
1 2008-10-11 23.5 0.3746667 1.124
2 2008-10-12 0.0 0.3320000 0.664
3 2008-12-13 19.0 1.3670000 8.202
回答2:
With dplyr
you can do the next (you can change mean
to your aggregation function):
df %>%
mutate(DateTime = as.POSIXct(paste(Date, Time))) %>%
group_by(DateTime = cut(DateTime, breaks="30 min")) %>%
summarize(Power = mean(Power))
Will result in the next output:
DateTime Power
<fctr> <dbl>
1 2008-10-11 23:57:00 0.3576
2 2008-12-13 18:57:00 1.3760
3 2008-12-13 19:27:00 1.3580
来源:https://stackoverflow.com/questions/40252525/how-to-aggregate-every-30-minutes-in-r