Why is as.Date slow on a character vector?

后端 未结 5 1054
囚心锁ツ
囚心锁ツ 2020-11-27 16:36

I started using data.table package in R to boost performance of my code. I am using the following code:

sp500 <- read.csv(\'../rawdata/GMTSP.csv\')
days &         


        
5条回答
  •  死守一世寂寞
    2020-11-27 17:05

    As others mentioned, strptime (converting from character to POSIXlt) is the bottleneck here. Another simple solution uses the lubridate package and its fast_strptime method instead.

    Here's what it looks like on my data:

    > tables()
         NAME      NROW  MB COLS                                     
    [1,] pp   3,718,339 126 session_id,date,user_id,path,num_sessions
         KEY         
    [1,] user_id,date
    Total: 126MB
    
    > pp[, 2]
                   date
          1: 2013-09-25
          2: 2013-09-25
          3: 2013-09-25
          4: 2013-09-25
          5: 2013-09-25
         ---           
    3718335: 2013-09-25
    3718336: 2013-09-25
    3718337: 2013-09-25
    3718338: 2013-10-11
    3718339: 2013-10-11
    
    > system.time(pp[, date := as.Date(fast_strptime(date, "%Y-%m-%d"))])
       user  system elapsed 
      0.315   0.026   0.344  
    

    For comparison:

    > system.time(pp[, date := as.Date(date, "%Y-%m-%d")])
       user  system elapsed 
    108.193   0.399 108.844 
    

    That's ~316 times faster!

提交回复
热议问题