Why is as.Date slow on a character vector?

后端 未结 5 1049
囚心锁ツ
囚心锁ツ 2020-11-27 16:36

I started using data.table package in R to boost performance of my code. I am using the following code:

sp500 <- read.csv(\'../rawdata/GMTSP.csv\')
days &         


        
5条回答
  •  情深已故
    2020-11-27 17:06

    Thanks for the suggestions. I solved it by writing the Gaussian algorithm for the dates myself and got far better results, see below.

    getWeekDay <- function(year, month, day) {
      # Implementation of the Gaussian algorithm to get weekday 0 - Sunday, ... , 7 - Saturday
      Y <- year
      Y[month<3] <- (Y[month<3] - 1)
    
      d <- day
      m <- ((month + 9)%%12) + 1
      c <- floor(Y/100)
      y <- Y-c*100
      dayofweek <- (d + floor(2.6*m - 0.2) + y + floor(y/4) + floor(c/4) - 2*c) %% 7
      return(dayofweek)
    }
    
    sp500 <- read.csv('../rawdata/GMTSP.csv')
    days <- c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday")
    
    # Using data.table to get the things much much faster
    sp500 <- data.table(sp500, key="Date")
    sp500 <- sp500[,Month:=as.integer(substr(Date,1,2))]
    sp500 <- sp500[,Day:=as.integer(substr(Date,4,5))]
    sp500 <- sp500[,Year:=as.integer(substr(Date,7,10))]
    #sp500 <- sp500[,Date:=as.Date(Date, "%m/%d/%Y")]
    #sp500 <- sp500[,Weekday:=factor(weekdays(sp500[,Date]), levels=days, ordered=T)]
    sp500 <- sp500[,Weekday:=factor(getWeekDay(Year, Month, Day))]
    levels(sp500$Weekday) <- days
    

    Running the whole block above gives (including reading the date from csv)... Data.table is truly impressive.

    user  system elapsed 
     19.074   0.803  20.284 
    

    Timing of the conversion itself is 3.49 elapsed.

提交回复
热议问题