Format multiple date formats in one columns using lubridate

偶尔善良 提交于 2019-11-28 08:35:59

问题


Sometimes I am given data sets that has two different date formats but common variables that have to been joined into one dataframe. Over the years, I've tried various solutions to get around this workflow hassle. Now that I've been using lubridate, it seems like many of these problems are easily solved. However, I am encountering some behaviour that seems weird to me though I imagine there is a good explanation that is beyond me. Say I am given a data set with different date formats that I join into one data frame. This dataframe looks like this:

library(ludridate)
library(dplyr)

df<-data.frame(Lab=c("A","B"),DATE=c("12/15/15","12/15/2013")); df

I want to convert this data to a date format with lubridate. However the following does not format consistently:

df %>% 
  mutate(mdy(DATE))

...but rather creates a 0015 date. If I filter just for Lab "A":

df %>% 
  filter(Lab=="A") %>%
  mutate(mdy(DATE))

... or even group_by Lab:

df %>% 
  group_by(Lab) %>%
  mutate(mdy(DATE))

Then I get the desired year format. Is this the correct behaviour of the lubridate family of date formatting functions? Is there a better way to accomplish what I am doing? I am sure that multiple date formats in one column is a relatively common (and annoying) occurence.

Thanks in advance.


回答1:


From the help on parse_date_time:

## ** how to use select_formats **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC"   "2013-09-27 UTC"

## to give priority to %y format, define your own select_format function:

my_select <-   function(trained){
  n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y",     names(trained))*1.5
  names(trained[ which.max(n_fmts) ])
}

parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"



回答2:


parse_date_time of lubridate package can help format multiple date formats in one go.

Syntax:

df$date = parse_date_time(df$date, c(format1, format2, format3))

You need to specify all the possible format types.

Since lubridate has some difficulty understanding (correctly) some format types, you need to make custom format.

In the help section , you will find the below illustration. You can recreate it to suit your requirement.

## ** how to use `select_formats` argument **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC"   "2013-09-27 UTC"

## to give priority to %y format, define your own select_format function:

my_select <-   function(trained){
   n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
   names(trained[ which.max(n_fmts) ])
}

parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"


来源:https://stackoverflow.com/questions/34389139/format-multiple-date-formats-in-one-columns-using-lubridate

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!