Dealing with Messy Dates

后端 未结 5 767
南方客
南方客 2021-01-31 08:18

I hope you didn\'t think I was asking for relationship advice.

Infrequently, I have to offer survey respondents the ability to specify when an event occurred. What resu

5条回答
  •  情书的邮戳
    2021-01-31 09:00

    My sympathy that your date didn't turn out as pretty as expected. ;-)

    I have constructed a (still partial) solution along the lines suggested by @Rguy.

    (Please note that this code still has a bug: It does't always return the correct time. For some reason, it doesn't always do a greedy match on the digits before the colon, thus sometimes returning 1:00 when the time is 11:00.)

    First, construct a helper function that wraps around gsub and grep. This function takes a character vector as one of its arguments and collapses this into a single string separated by |. The effect of this is to allow you to easily pass multiple patterns to be matched by a regex:

    find.pattern <- function(x, pattern_list){
      pattern <- paste(pattern_list, collapse="|")
      ret <- gsub(paste("^.*(", pattern, ").*", sep=""), "\\1", x, ignore.case=TRUE)
      ret[ret==x] <- NA 
      ret2 <- grepl(paste("^(", pattern, ")$", sep=""), x, ignore.case=TRUE)
      ret[ret2] <- x[ret2] 
      ret
    }
    

    Next, use some built-in variable names to construct a vector of months and abbreviations:

    all.month <- c(month.name, month.abb)
    

    Finally, construct a data frame with different extracts:

    ret <- data.frame(
        data = dat, 
        date1 = find.pattern(dat, "\\d+/\\d+/\\d+"),
        date2 = find.pattern(dat, 
          paste(all.month, "\\s*\\d+[(th)|,]*\\s{0,3}[(2010)|(2011)]*", collapse="|", sep="")),
        year = find.pattern(dat, c(2010, 2011)),
        month = find.pattern(dat, month.abb), #Use base R variable called month.abb for month names
        hour = find.pattern(dat, c("\\d+[\\.:h]\\d+", "12 noon")),
        ampm = find.pattern(dat, c("am", "pm"))
    )
    

    The results:

    head(ret, 50)
                          data  date1        date2 year month  hour ampm
    20   April 4th around 10am      April 4th     Apr     am
    21   April 4th around 10am      April 4th     Apr     am
    22     Mar 18, 2011 9:33am    Mar 18, 2011 2011   Mar  9:33   am
    23     Mar 18, 2011 9:27am    Mar 18, 2011 2011   Mar  9:27   am
    24                      df                  
    25                      fg                  
    26                   12:16                12:16 
    27                    9:50                 9:50 
    28   Feb 8, 2011 / 12:20pm     Feb 8, 2011 2011   Feb  2:20   pm
    29         8:34 am  2/4/11 2/4/11              8:34   am
    30     Jan 31, 2011 2:50pm    Jan 31, 2011 2011   Jan  2:50   pm
    31     Jan 31, 2011 2:45pm    Jan 31, 2011 2011   Jan  2:45   pm
    32     Jan 31, 2011 2:38pm    Jan 31, 2011 2011   Jan  2:38   pm
    33     Jan 31, 2011 2:26pm    Jan 31, 2011 2011   Jan  2:26   pm
    34                   11h09                11h09 
    35                11:00 am                 1:00   am
    36                 1h02 pm                 1h02   pm
    37                   10h03                10h03 
    38                    2h10                 2h10 
    39 Jan 13, 2011 9:50am Van    Jan 13, 2011 2011   Jan  9:50   am
    40            Jan 12, 2011    Jan 12, 2011 2011   Jan   
    

提交回复
热议问题