问题
The data I'm trying to convert is supposed to be a date, however it is formatted as mmddyyyy with no separation by dashes or slashes. In order to work with dates in R, I would like to have this formatted as mm-dd-yyyy or mm/dd/yyyy.
I think I might need to use grep(), but I'm not sure how to use it to reformat all of the dates that are in the mmddyyyy format.
回答1:
Have a look at lubridate mdy function
require(lubridate)
a <- "10281994"
mdy(a)
gives you
[1] "1994-10-28 UTC"
of class "POSIXct" "POSIXt" so a datetime in R. (thanks Joshua Ulrich for the correction)
You could use as.Date(mdy(a)) = 1994-10-28 to get a Object of class Date.
There are mutations like ymd and dmy within lubridate as well.
回答2:
Updated: Improved with @Richard Scriven's colClasses and simpler as.Date() suggestions
Here are two similar methods that worked for me, going from a csv containing mmddyyyy format date, to getting it recognized by R as a date object.
Starting first with a simple file tv.csv:
Series,FirstAir
Quantico,09272015
Muppets,09222015
Method 1: All as string
Once within R,
> t = read.csv('tv.csv', colClasses = 'character')
- imports
tv.csvas a data frame namedt colClasses = 'character')option causes all the data to be considered thecharacterdata type (instead of beingFactor,inttypes)
Examine its initial structure:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : chr "Quantico" "Muppets"
$ FirstAir: chr "09272015" "09222015"
- R has imported all as strings of characters, indicated here as type
chr
The chr or string of characters are then easily converted into a date:
> t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")
as.Date()performs string to date conversion%m%d%Yspecifies how to interpret the input int$FirstAir. These format codes, at least on Linux, can be found with running$ man datewhich brings up the manual on thedateprogram, where there is a list of formatting codes. For example it says%m month (01..12)
Method 2: Import then fix only the date
If for some reason you don't want a blanket import conversion to all characters, for example a file with many variables and wish to leave R's auto type recognition in use but merely "fix" the one date variable, follow this method.
Once within R,
> t = read.csv('tv.csv')
- imports
tv.csvas a data frame namedt
Examine its initial structure:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : Factor w/ 2 levels "Muppets","Quantico": 2 1
$ FirstAir: int 9272015 9222015
>
- R tries its best to guess the variable type per variable
- As you can see an immediate problem is, for
FirstAirvariable R has imported09272015asintmeaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.
This can be done in a single command but for clarity I have broken this into two steps. First,
> t$FirstAir = sprintf("%08d", t$FirstAir)
sprintfis a formatting function0means pad with zeroes8means ensure 8 characters, because mmddyyyy is total 8 charactersdis used when the input is a number, which currently it is, recallstr()output claimed thet$FirstAiris anintmeaning integert$FirstAiris the variable we are both setting and using as input
Check the result:
> str(t$FirstAir)
chr [1:2] "09272015" "09222015"
- it successfully converted from an
intto achrtype, for example9272015became"09272015"
Now it is a string or chr type we can then convert, same as method 1.
> t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))
Result
We do a final check:
> str(t$FirstAir)
Date[1:2], format: "2015-09-27" "2015-09-22"
In both cases, what were original values in a text file are have now been successfully converted into R date objects.
来源:https://stackoverflow.com/questions/32854538/converting-a-character-string-into-a-date-in-r