Can I apply a function over a vector using base tryCatch?

后端 未结 3 725
既然无缘
既然无缘 2021-01-16 16:24

I\'m trying to parse dates (using lubridate functions) from a vector which has mixed date formats.

departureDate <- c(\"Aug 17, 2020 12:00:00 AM\", \"Nov          


        
3条回答
  •  抹茶落季
    2021-01-16 17:09

    One method would be to iterate through a list of candidate formats and apply it only to dates not previously parsed correctly.

    fmts <- c("%b %d, %Y %H:%M:%S %p", "%d/%m/%Y")
    dates <- rep(Sys.time()[NA], length(departureDate))
    for (fmt in fmts) {
      isna <- is.na(dates)
      if (!any(isna)) break
      dates[isna] <- as.POSIXct(departureDate[isna], format = fmt)
    }
    dates
    #  [1] "2020-08-17 12:00:00 PDT" "2019-11-19 12:00:00 PST" "2020-12-21 12:00:00 PST"
    #  [4] "2020-12-24 12:00:00 PST" "2020-12-24 12:00:00 PST" "2020-04-19 12:00:00 PDT"
    #  [7] "2019-06-28 00:00:00 PDT" "2019-08-16 00:00:00 PDT" "2019-02-04 00:00:00 PST"
    # [10] "2019-04-10 00:00:00 PDT" "2019-07-28 00:00:00 PDT" "2019-07-26 00:00:00 PDT"
    # [13] "2020-06-22 12:00:00 PDT" "2020-04-05 12:00:00 PDT" "2021-05-01 12:00:00 PDT"
    as.Date(dates)
    #  [1] "2020-08-17" "2019-11-19" "2020-12-21" "2020-12-24" "2020-12-24" "2020-04-19" "2019-06-28"
    #  [8] "2019-08-16" "2019-02-04" "2019-04-10" "2019-07-28" "2019-07-26" "2020-06-22" "2020-04-05"
    # [15] "2021-05-01"
    

    I encourage you to put the most-likely formats first in the fmts vector.

    The way this is set up, as soon as all elements are correctly found, no further formats are attempted (i.e., break).


    Edit: if there is a difference in LOCALE where AM/PM are not locally recognized, then one method would be to first remove them from the strings:

    departureDate <- gsub("\\s[AP]M$", "", departureDate)
    departureDate
    #  [1] "Aug 17, 2020 12:00:00" "Nov 19, 2019 12:00:00" "Dec 21, 2020 12:00:00"
    #  [4] "Dec 24, 2020 12:00:00" "Dec 24, 2020 12:00:00" "Apr 19, 2020 12:00:00"
    #  [7] "28/06/2019"            "16/08/2019"            "04/02/2019"           
    # [10] "10/04/2019"            "28/07/2019"            "26/07/2019"           
    # [13] "Jun 22, 2020 12:00:00" "Apr 5, 2020 12:00:00"  "May 1, 2021 12:00:00" 
    

    and then use a simpler format:

    fmts <- c("%b %d, %Y %H:%M:%S", "%d/%m/%Y")
    

提交回复
热议问题