How to make a set containing count of data in rolling set of buckets

最后都变了- 提交于 2019-12-22 09:47:14

问题


I have the server logs for a months worth of traffic. Partial example below

"UploadDateGMT","UserFileSize","TotalBusinessUnits"
"2012-01-01 00:00:38","1223","1"
"2012-01-01 00:01:16","1302","1"
"2012-01-01 00:08:10","1302","1"

I would like to convert this into a data set where I have a count of how many bytes of submissions there were in each five minute window on a rolling basis. (i.e. 0-5, 1-6, 2-7, etc.) From this, I could extract maximum load, 95% load, make pretty graphs of load, etc.


回答1:


To expand on @PLapointe's answer:

endp <- endpoints(tab2, on="mins", k=1) # 1 minute endpoints
onemin <- period.apply(tab2,endp,sum)   # sum per 1-minute period
onemin <- align.time(onemin)            # align to end-of-period times
# all one-minute increments from start--end of onemin
allonemin <- seq(start(onemin), end(onemin), by="1 min")
onemin <- merge(onemin, xts(,allonemin))
fivemin <-  rollapplyr(onemin, 5, sum, na.rm=TRUE, fill=NA)



回答2:


The xts package will do the trick:

library(xts)
tab <-read.table(text="UploadDateGMT,UserFileSize,TotalBusinessUnits
'2012-01-01 00:00:38',1223,1
'2012-01-01 00:01:16',1302,1
'2012-01-01 00:08:10',1302,1", header=TRUE, as.is=TRUE,sep = ",")

tab2<-xts(tab$UserFileSize,order.by=as.POSIXct(tab$UploadDateGMT) ) #create xts object
endp <-endpoints(tab2, on="mins", k=5) #5 minutes endpoints
fivemin <-period.apply(tab2,endp,sum) #sum per 5-minute period
fivemin

                    [,1]
2012-01-01 00:01:16 2525
2012-01-01 00:08:10 1302

If you want the time column to be in 5 minutes increments:

res<- align.time( fivemin[endpoints(fivemin, on="mins", k=5)], n=60*5)


来源:https://stackoverflow.com/questions/10741180/how-to-make-a-set-containing-count-of-data-in-rolling-set-of-buckets

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!