data.table time subset vs xts time subset

爷,独闯天下 提交于 2019-11-30 05:51:44

问题


Hi I am looking to subset some minutely data by time. I normally use xts doing something like:

subset.string <- 'T10:00/T13:00' 
xts.min.obj[subset.string]

to get all the rows which are between 10am and 1pm (inclusive) EACH DAY and have the output as an xts format. But is a bit slow for my purposes...e.g

j <- xts(rnorm(10e6),Sys.time()-(10e6:1))
system.time(j['T10:00/T16:00'])
   user  system elapsed 
  5.704   0.577  17.115 

I know that data.table is v fast and at subsetting large datasets so am wondering if in conjunction with the fasttime package to deal with fast POSIXct creations, if it would be worth it to create a function like

dt.time.subset <- function(xts.min.obj, subset.string){
  require(data.table)
  require(fasttime)
  x.dt <- data.table(ts=format(index(xts.min.obj),"%Y-%m-%d %H:%M:%S %Z"),
                     coredata(xts.min.obj))
  out <- x.dt[,some.subsetting.operation.using."%between%"]
  xts(out,fastPOSIXct(out[,ts])
}

to convert the xts.min.obj into a data.table add some sort of character index and then use data.table to subset the relevant rows use the output row index with fasttime to recreate an xts output? or is this too many excess operations for something that is already highly optimised and written in C?


回答1:


If you're ok with specifying your range in UTC, you can do:

j[(.index(j) %% 86400) %between% c(10*3600, 16*3600 + 60)]
# +60 because xts includes that minute; you'll need to offset the times
# appropriately to match with xts unless you live in UTC :)

j <- xts(rnorm(10e6),Sys.time()-(10e6:1))
system.time(j[(.index(j) %% 86400) %between% c(10*3600, 16*3600 + 60)])
#  user  system elapsed 
#  1.17    0.08    1.25 
# likely faster on your machine as mine takes minutes to run the OP bench


来源:https://stackoverflow.com/questions/17345951/data-table-time-subset-vs-xts-time-subset

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!