Convert dd/mm/yy and dd/mm/yyyy to Dates

匆匆过客 提交于 2019-11-26 08:35:07

问题


I have some a character vector with dates in various formats like this

dates <- c(\"23/11/12\", \"20/10/2012\", \"22/10/2012\" ,\"23/11/12\")

I want to convert these to Dates. I have tried the very good dmy from the lubridate package, but this does not work:

    dmy(dates)
[1] \"0012-11-23 UTC\" \"2012-10-20 UTC\" \"2012-10-22 UTC\" \"0012-11-23 UTC\"

It is treating the /12 year as if it is 0012.

So I now am trying regular expression to select each type and individually convert to dates using as.Date(). However the regular expression I have tried to select the dd/mm/yy only does not work.

dates[grep(\'[0-9]{2}/[0-9]{2}/[0-9]{2,2}\', dates)]

returns

[1] \"23/11/12\"   \"20/10/2012\" \"22/10/2012\" \"23/11/12\"

I thought that the {2,2} should get a exactly 2 numbers and not all of them. I\'m not very good at regular expression so any help will be appreciated.

Thanks

EDIT

What I actually have are three different types of date as below

dates <- c(\"23-Jul-2013\", \"23/11/12\", \"20/10/2012\", \"22/10/2012\" ,\"23/11/12\")

And I want to convert these to dates

parse_date_time(dates,c(\'dmy\'))

gives me

[1] \"2013-07-23\" \"0012-11-23\" \"2012-10-20\" \"2012-10-22\" \"0012-11-23\"

However, this is wrong and 0012 should be 2012. I would like (a fairly simple) solution to this.

One solution I now have (thanks to @plannapus)is to use regular expressions I actually ended up creating this function as I was still getting some cases where the lubridate approach was turning 12 into 0012

    asDateRegex <- function(dates, 
        #selects strings from the vector dates using regexes and converts these to Dates
        regexes = c(\'[0-9]{2}/[0-9]{2}/[0-9]{4}\', #dd/mm/yyyy
            \'[0-9]{2}/[0-9]{2}/[0-9]{2}$\', #dd/mm/yy
            \'[0-9]{2}-[[:alpha:]]{3}-[0-9]{4}\'), #dd-mon-yyyy
        orders = \'dmy\',
        ...){
        require(lubridate)
        new_dates <- as.Date(rep(NA, length(dates)))
        for(reg in regexes){
            new_dates[grep(reg, dates)] <- as.Date(parse_date_time(dates[grep(reg, dates)], order = orders))
        }
        new_dates
    }

asDateRegex (dates)
[1] \"2012-10-20\" \"2013-07-23\" \"2012-11-23\" \"2012-10-22\" \"2012-11-23\"

But this is not very elegant. Any better solutions?


回答1:


You can use parse_date_time from lubridate:

some.dates <- c("23/11/12", "20/10/2012", "22/10/2012" ,"23/11/12")
parse_date_time(some.dates,c('dmy'))
[1] "2012-11-23 UTC" "2012-10-20 UTC" "2012-10-22 UTC" "2012-11-23 UTC"

But , Note that the order of format is important :

some.dates <- c("20/10/2012","23/11/12",  "22/10/2012" ,"23/11/12")
parse_date_time(some.dates,c('dmY','dmy'))

[1] "2012-10-20 UTC" "2012-11-23 UTC" "2012-10-22 UTC" "2012-11-23 UTC"

EDIT

Internally parse_date_time is using guess_formats (which I guess uses some regular expressions):

guess_formats(some.dates,c('dmy'))
       dmy        dmy        dmy        dmy 
"%d/%m/%Y" "%d/%m/%y" "%d/%m/%Y" "%d/%m/%y" 

As mentioned in the comment you can use parse_date_time like this:

as.Date(dates, format = guess_formats(dates,c('dmy')))



回答2:


You can choose the format based upon input length of date.

y <- ifelse(nchar(dates) == 8, "y", "Y")
as.Date(dates, format = paste0("%d/%m/%", y))



回答3:


You can use strsplit and nchar to get a subvector of dates where the year is two characters long:

> dates[sapply(strsplit(dates,"/"),function(x)nchar(x)[3]==2)]
[1] "23/11/12" "23/11/12"



回答4:


Following your original attempt at regex based solutions, you may try gsub using this regexp, then converting to any date-time format you wish...

#  Replace 4 digit years with two digit years
short <- gsub( "([0-9]{2})([0-9]{2})$" , "\\2" , dates )
#[1] "23/11/12" "20/10/12" "22/10/12" "23/11/12"


as.Date( short , format = "%d/%m/%y" )
#[1] "2012-11-23" "2012-10-20" "2012-10-22" "2012-11-23"



回答5:


If you really wanted to do it in regexp you should have used $ to signify that there was nothing (i.e. end of string) after the last two-digits numbers:

dates[grep('[0-9]{2}/[0-9]{2}/[0-9]{2}$', dates)]
[1] "23/11/12" "23/11/12"

Otherwise, in addition to the other answers you can have a look here and here for other ways of handling multiple date formats.




回答6:


Here's a base R way for the more general case not (yet) addressed in the unaccepted answers.

dates <- c("23-Jul-2013", "23/11/12", "20/10/2012", "22/10/2012" ,"23/11/12")
fmts <- list('%d-%b-%Y', '%d/%m/%y', '%d/%m/%Y')
d <- mapply(as.Date, list(dates), fmts, SIMPLIFY=FALSE)
max.d <- do.call(function(...) pmax(..., na.rm=TRUE), d)
min.d <- do.call(function(...) pmin(..., na.rm=TRUE), d)
max.d[max.d > Sys.Date()] <- min.d[max.d > Sys.Date()]
max.d
# [1] "2012-11-23" "2012-10-20" "2012-10-22" "2012-11-23"


来源:https://stackoverflow.com/questions/19425023/convert-dd-mm-yy-and-dd-mm-yyyy-to-dates

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!