Using Dates with the data.table package

寵の児 提交于 2019-12-04 18:35:47

问题


I recently discovered the data.table package and was now wondering whether or not I should replace some of my plyr-code. To summarize, I really like plyr and I basically achieved everything I wanted. However, my code runs a while and the outlook of speeding things up was enough for me to run some tests. Those tests ended quite soon and here is the reason.

What I do quite often with plyr is to split my data by a column containing dates and do some calculations:

library(plyr)
DF <-  data.frame(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
#Split up data and apply arbitrary function
ddply(DF, .(Date), function(df){mean(df$y) - df[nrow(df), "y"]})

However, using a column with the Date-format does not seem to work in data.table:

library(data.table)
DT <- data.table(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
#Error in setkey(DT, Date) : Column 'Date' cannot be auto converted to integer without losing information.

If I understand the package correctly, I only get substantial speed-ups when I use setkey(). Also, I think it wouldn't be good coding to constantly convert between Date and numeric. So am I missing something or is there just no easy way to achieve that with data.table?

sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.6.3 zoo_1.7-2        lubridate_0.2.5  ggplot2_0.8.9    proto_0.3-9.2    reshape_0.8.4   
[7] reshape2_1.1     xtable_1.5-6     plyr_1.5.2      

loaded via a namespace (and not attached):
[1] digest_0.5.0    lattice_0.19-30 stringr_0.5     tools_2.13.1 

回答1:


This should work:

DT <- data.table(Date=as.ITime(rep(c(Sys.time(), Sys.time() + 60), each=6)),
                 y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)

The data.table package contains some date/time classes with integer storage mode. See ?IDateTime:

Date and time classes with integer storage for fast sorting and grouping. Still experimental!

  • IDate is a date class derived from Date. It has the same internal representation as the Date class, except the storage mode is integer.
  • ITime is a time-of-day class stored as the integer number of seconds in the day. as.ITime does not allow days longer than 24 hours. Because ITime is stored in seconds, you can add it to a POSIXct object, but you should not add it to a Date object.
  • IDateTime takes a date-time input and returns a data table with columns date and time.


来源:https://stackoverflow.com/questions/6978970/using-dates-with-the-data-table-package

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!