Why are lubridate functions so slow when compared with as.POSIXct?

前端 未结 2 1527
挽巷
挽巷 2020-12-14 16:53

As the title goes. Why is the lubridate function so much slower?

library(lubridate)
library(microbenchmark)

Dates <- sample(c(dates = format(seq(ISOdate(         


        
相关标签:
2条回答
  • 2020-12-14 17:33

    For the same reason cars are slow in comparison to riding on top of rockets. The added ease of use and safety make cars much slower than a rocket but you're less likely to get blown up and it's easier to start, steer, and brake a car. However, in the right situation (e.g., I need to get to the moon) the rocket is the right tool for the job. Now if someone invented a car with a rocket strapped to the roof we'd have something.

    Start with looking at what dmy is doing and you'll see the difference for the speed (by the way from your bechmarks I wouldn't say that lubridate is that much slower as these are in milliseconds):

    dmy #type this into the command line and you get:

    >dmy
    function (..., quiet = FALSE, tz = "UTC") 
    {
        dates <- unlist(list(...))
        parse_date(num_to_date(dates), make_format("dmy"), quiet = quiet, 
            tz = tz)
    }
    <environment: namespace:lubridate>
    

    Right away I see parse_date and num_to_date and make_format. Makes one wonder what all these guys are. Let's see:

    parse_date

    > parse_date
    function (x, formats, quiet = FALSE, seps = find_separator(x), 
        tz = "UTC") 
    {
        fmt <- guess_format(head(x, 100), formats, seps, quiet)
        parsed <- as.POSIXct(strptime(x, fmt, tz = tz))
        if (length(x) > 2 & !quiet) 
            message("Using date format ", fmt, ".")
        failed <- sum(is.na(parsed)) - sum(is.na(x))
        if (failed > 0) {
            message(failed, " failed to parse.")
        }
        parsed
    }
    <environment: namespace:lubridate>
    

    num_to_date

    > getAnywhere(num_to_date)
    A single object matching ‘num_to_date’ was found
    It was found in the following places
      namespace:lubridate
    with value
    
    function (x) 
    {
        if (is.numeric(x)) {
            x <- as.character(x)
            x <- paste(ifelse(nchar(x)%%2 == 1, "0", ""), x, sep = "")
        }
        x
    }
    <environment: namespace:lubridate>
    

    make_format

    > getAnywhere(make_format)
    A single object matching ‘make_format’ was found
    It was found in the following places
      namespace:lubridate
    with value
    
    function (order) 
    {
        order <- strsplit(order, "")[[1]]
        formats <- list(d = "%d", m = c("%m", "%b"), y = c("%y", 
            "%Y"))[order]
        grid <- expand.grid(formats, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
        lapply(1:nrow(grid), function(i) unname(unlist(grid[i, ])))
    }
    <environment: namespace:lubridate>
    

    Wow we got strsplit-ting, expand-ing.grid-s, paste-ing, ifelse-ing, unname-ing etc. plus a Whole Lotta Error Checking Going On (play on the Zep song). So what we have here is some nice syntactic sugar. Mmmmm tasty but it comes with a price, speed.

    Compare that to as.POSIXct:

    getAnywhere(as.POSIXct)  #tells us to use methods to see the business
    methods('as.POSIXct')    #tells us all the business
    as.POSIXct.date          #what I believe your code is using (I don't use dates though)
    

    There's a lot more Internal coding and less error checking going on with as.POSIXct So you have to ask do I want ease and safety or speed and power? Depends on the job.

    0 讨论(0)
  • 2020-12-14 17:58

    @Tyler's answer is correct. Here's some more info including a tip on making lubridate faster - from the help file:

    " Lubridate has an inbuilt very fast POSIX parser, ported from the fasttime package by Simon Urbanek. This functionality is as yet optional and could be activated with options(lubridate.fasttime = TRUE). Lubridate will automatically detect POSIX strings and use fast parser instead of the default strptime utility. "

    0 讨论(0)
提交回复
热议问题