问题
I have been struggling with this for a while now: I have a data frame that contains 5-minute measurements (for around 6 months) of different parameters. I want to aggregate them and get the mean of every parameter every 30 min. Here is a short example:
TIMESTAMP <- c("2015-12-31 0:30", "2015-12-31 0:35","2015-12-31 0:40", "2015-12-31 0:45", "2015-12-31 0:50", "2015-12-31 0:55", "2015-12-31 1:00", "2015-12-31 1:05", "2015-12-31 1:10", "2015-12-31 1:15", "2015-12-31 1:20", "2015-12-31 1:25", "2015-12-31 1:30")
value1 <- c(45, 50, 68, 78, 99, 100, 5, 9, 344, 10, 45, 68, 33)
mymet <- as.data.frame(TIMESTAMP, value1)
mymet$TIMESTAMP <- as.POSIXct(mymet$TIMESTAMP, format = "%Y-%m-%d %H:%M")
halfhour <- aggregate(mymet, list(TIME = cut(mymet$TIMESTAMP, breaks = "30 mins")),
mean, na.rm = TRUE)
What I want to get is the average between 00:35 and 1:00 and call this DATE-1:00AM, however, what I get is: average between 00:30 and 00:55 and this is called DATE-12:30am.
How can I change the function to give me the values that I want?
回答1:
The trick (I think) is looking at when your first observation starts. If the first observation is 00:35 and you do the 30 minute cut then the intervals should follow the logic you want. Regarding the name of the Breaks it's just a matter of adding 25 minutes to the name and then you get what you want. Here is an example for 6 months of 2015:
require(lubridate)
require(dplyr)
TIMESTAMP <- seq(ymd_hm('2015-01-01 00:00'),ymd_hm('2015-06-01 23:55'), by = '5 min')
TIMESTAMP <- data.frame(obs=1:length(TIMESTAMP),TS=TIMESTAMP)
TIMESTAMP <- TIMESTAMP[-(1:7),] #TO start with at 00:35 minutes
TIMESTAMP$Breaks <- cut(TIMESTAMP$TS, breaks = "30 mins")
TIMESTAMP$Breaks <- ymd_hms(as.character(TIMESTAMP$Breaks)) + (25*60)
Averages <- TIMESTAMP %>% group_by(Breaks) %>% summarise(MeanObs=mean(obs,na.rm = TRUE))
回答2:
If you get mymet
constructed properly, you can cut TIMESTAMP
into bins (which you can do with cut.POSIXt
) so you can aggregate
:
mymet$half_hour <- cut(mymet$TIMESTAMP, breaks = "30 min")
aggregate(value1 ~ half_hour, mymet, mean)
## half_hour value1
## 1 2015-12-31 00:30:00 73.33333
## 2 2015-12-31 01:00:00 80.16667
## 3 2015-12-31 01:30:00 33.00000
Data
mymet <- structure(list(TIMESTAMP = structure(c(1451539800, 1451540100,
1451540400, 1451540700, 1451541000, 1451541300, 1451541600, 1451541900,
1451542200, 1451542500, 1451542800, 1451543100, 1451543400), class = c("POSIXct",
"POSIXt"), tzone = ""), value1 = c(45, 50, 68, 78, 99, 100, 5,
9, 344, 10, 45, 68, 33)), .Names = c("TIMESTAMP", "value1"), row.names = c(NA,
-13L), class = "data.frame")
来源:https://stackoverflow.com/questions/39987875/r-aggregate-by-date-every-30min-mean