Converting to date in a character column that contains two date formats

后端 未结 4 1016

I have CSV file with \"date\" column but it contains two different date format as the following

7/12/2015 15:28 as m/d/yyyy hh:mm
18-04-2016 18:20  as d/m/yy         


        
相关标签:
4条回答
  • 2020-12-06 14:15

    Not all *.csv files are created equally. If the date information is coded as a date in Excel or an open office format but simply displayed in two different formats in the file, it would be much simpler to change the dates in a spreadsheet environment than trying to tackle the task in R.

    0 讨论(0)
  • 2020-12-06 14:25

    As @thelatemail stated, it is difficult to deal with ambiguous date formats. You have the worst here: a combination of (North American) month-day-year along with (rest-of-the work) day-month-year.

    Now, the anytime package helps here in general as it allows us to parse without requiring explicit formats while also allowing different input formats in the same string.

    However, it too must retain some sanity -- and hence does not support mixing d/m/y and m/d/y as you do here by default because there is just no way to automate this.

    But here we can just opt to add a single missing format, and all is well:

    R> library(anytime)
    R> anytime::addFormats("%d-%m-%Y %H:%M:%S")  # add a day-month-year variant
    R> anytime(c("7/12/2015 15:28", "18-04-2016 18:20"))
    [1] "2015-07-12 15:28:00 CDT" "2016-04-18 18:20:00 CDT"
    R> 
    

    And with that, the difference is a simple

    R> diff(anytime(c("7/12/2015 15:28", "18-04-2016 18:20")))
    Time difference of 281.119 days
    R> 
    
    0 讨论(0)
  • 2020-12-06 14:26

    External packages are not required if you simply have two datetime formats. Just run both formats through the parser and take the non-missing one:

    x <- c("7/12/2015 15:28","18-04-2016 18:20")
    pmax(
      as.POSIXct(x, format="%m/%d/%Y %H:%M", tz="UTC"),
      as.POSIXct(x, format="%d-%m-%Y %H:%M", tz="UTC"),
      na.rm=TRUE
    )
    #[1] "2015-07-12 15:28:00 UTC" "2016-04-18 18:20:00 UTC"
    

    As far as I know, there is absolutely no way to deal with ambiguous date formats automatically, so hard-coding is the way to go here probably.

    0 讨论(0)
  • 2020-12-06 14:36

    Here is another option with lubridate

    library(lubridate)
    parse_date_time(x, c("mdy HM", "dmy HM"))   
    #[1] "2015-12-07 15:28:00 UTC" "2016-04-18 18:20:00 UTC"
    

    data

    x <- c("7/12/2015 15:28","18-04-2016 18:20")
    
    0 讨论(0)
提交回复
热议问题