Converting a character string into a date in R

前端 未结 2 729
离开以前
离开以前 2020-12-06 21:35

The data I\'m trying to convert is supposed to be a date, however it is formatted as mmddyyyy with no separation by dashes or slashes. In order to work with dates in R, I wo

相关标签:
2条回答
  • 2020-12-06 21:49

    Updated: Improved with @Richard Scriven's colClasses and simpler as.Date() suggestions

    Here are two similar methods that worked for me, going from a csv containing mmddyyyy format date, to getting it recognized by R as a date object.

    Starting first with a simple file tv.csv:

    Series,FirstAir
    Quantico,09272015
    Muppets,09222015
    

    Method 1: All as string

    Once within R,

    > t = read.csv('tv.csv', colClasses = 'character')
    
    • imports tv.csv as a data frame named t
    • colClasses = 'character') option causes all the data to be considered the character data type (instead of being Factor, int types)

    Examine its initial structure:

    > str(t)
    'data.frame':   2 obs. of  2 variables:
     $ Series  : chr  "Quantico" "Muppets"
     $ FirstAir: chr  "09272015" "09222015"
    
    • R has imported all as strings of characters, indicated here as type chr

    The chr or string of characters are then easily converted into a date:

    > t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")
    
    • as.Date() performs string to date conversion
    • %m%d%Y specifies how to interpret the input in t$FirstAir. These format codes, at least on Linux, can be found with running $ man date which brings up the manual on the date program, where there is a list of formatting codes. For example it says %m month (01..12)

    Method 2: Import then fix only the date

    If for some reason you don't want a blanket import conversion to all characters, for example a file with many variables and wish to leave R's auto type recognition in use but merely "fix" the one date variable, follow this method.

    Once within R,

    > t = read.csv('tv.csv')
    
    • imports tv.csv as a data frame named t

    Examine its initial structure:

    > str(t)
    'data.frame':   2 obs. of  2 variables:
     $ Series  : Factor w/ 2 levels "Muppets","Quantico": 2 1
     $ FirstAir: int  9272015 9222015
    >
    
    • R tries its best to guess the variable type per variable
    • As you can see an immediate problem is, for FirstAir variable R has imported 09272015 as int meaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.

    This can be done in a single command but for clarity I have broken this into two steps. First,

    > t$FirstAir = sprintf("%08d", t$FirstAir)
    
    • sprintf is a formatting function
    • 0 means pad with zeroes
    • 8 means ensure 8 characters, because mmddyyyy is total 8 characters
    • d is used when the input is a number, which currently it is, recall str() output claimed the t$FirstAir is an int meaning integer
    • t$FirstAir is the variable we are both setting and using as input

    Check the result:

    > str(t$FirstAir)
     chr [1:2] "09272015" "09222015"
    
    • it successfully converted from an int to a chr type, for example 9272015 became "09272015"

    Now it is a string or chr type we can then convert, same as method 1.

    > t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))
    

    Result

    We do a final check:

    > str(t$FirstAir)
     Date[1:2], format: "2015-09-27" "2015-09-22"
    

    In both cases, what were original values in a text file are have now been successfully converted into R date objects.

    0 讨论(0)
  • 2020-12-06 21:55

    Have a look at lubridate mdy function

    require(lubridate)
    a <- "10281994"
    mdy(a)
    

    gives you

    [1] "1994-10-28 UTC"
    

    of class "POSIXct" "POSIXt" so a datetime in R. (thanks Joshua Ulrich for the correction)

    You could use as.Date(mdy(a)) = 1994-10-28 to get a Object of class Date.

    There are mutations like ymd and dmy within lubridate as well.

    0 讨论(0)
提交回复
热议问题