Converting dates from excel to R

后端 未结 2 1212
梦毁少年i
梦毁少年i 2021-01-15 04:05

I have difficulty converting dates from excel (reading from csv) to R. Help is much appreciated.

Here is what I\'m doing:

df$date = as.Date(df$exce         


        
相关标签:
2条回答
  • 2021-01-15 05:01

    Your data is formatted as Month/Day/Year so

    df$date = as.Date(df$excel.date, format = "%d/%m/%Y")
    

    should be

    df$date = as.Date(df$excel.date, format = "%m/%d/%Y")
    
    0 讨论(0)
  • 2021-01-15 05:05

    First of all, make sure you have the dates in your file in an unambiguous format, using full years (not just 2 last numbers). %Y is for "year with century" (see ?strptime) but you don't seem to have century. So you can use %y (at your own risk, see ?strptime again) or reformat the dates in Excel.

    It is also a good idea to use as.is=TRUE with read.csv when reading in these data -- otherwise character vectors are converted to factors which can lead to unexpected results.

    And on Wndows it may be easier to use RODBC to read in dates directly from xls or xlsx file.

    (edit)

    The following may give a hint:

    > as.Date("13/04/2014", format= "%d/%m/%Y")
    [1] "2014-04-13"
    > as.Date(factor("13/04/2014"), format= "%d/%m/%Y")
    [1] "2014-04-13"
    > as.Date(factor("13/04/14"), format= "%d/%m/%Y")
    [1] "14-04-13"
    > as.Date(factor("13/04/14"), format= "%d/%m/%y")
    [1] "2014-04-13"
    

    (So as.Date can actually take care of factors - the magick happens in as.Date.factor method defined as:

    function (x, ...)  as.Date(as.character(x), ...)
    

    It is not a good idea to represent dates as factors but in this case it is not a problem either. I think the problem is excel which saves your years as 2-digit numbers in a CSV file, without asking you.)

    -

    The ?strptime help file says that using %y is platform specific - you can have different results on different machines. So if there's no way of going back to the source and save the csv in a better way you might use something like the following:

    x <- c("7/28/05", "7/28/05", "12/16/05", "5/1/06", "4/21/05", "1/25/07")
    
    repairExcelDates <- function(x, yearcol=3, fmt="%m/%d/%Y") {
     x <-  do.call(rbind, lapply(strsplit(x, "/"), as.numeric))
     year <- x[,yearcol]
     if(any(year>99)) stop("dont'know what to do")
     x[,yearcol] <- ifelse(year <= as.numeric(format(Sys.Date(), "%Y")), year+2000, year + 1900) 
     # if year <= current year then add 2000, otherwise add 1900
     x <- apply(x, 1, paste, collapse="/")
     as.Date(x, format=fmt)
     }
    
    repairExcelDates(x)
    # [1] "2005-07-28" "2005-07-28" "2005-12-16" "2006-05-01" "2005-04-21"
    # [6] "2007-01-25"
    
    0 讨论(0)
提交回复
热议问题