data difference in `as.POSIXct` with Excel

后端 未结 4 903
眼角桃花
眼角桃花 2020-12-10 16:32

My actual data looks like:

8/8/2013 15:10
7/26/2013 10:30
7/11/2013 14:20
3/28/2013 16:15
3/18/2013 15:50

When I read this from the excel f

相关标签:
4条回答
  • 2020-12-10 17:00

    Maybe it is a matter of how R reads the data. Just an example here with lubridate seems to work well.

    x <- "8/8/2013 15:10"
    library(lubridate)
    dmy_hm(x, tz = "GMT")
    [1] "2013-08-08 15:10:00 GMT"
    
    0 讨论(0)
  • 2020-12-10 17:06

    The problem is that either R of Excel is rounding the number to two decimals. When you convert the for example the cell with 8/8/2013 15:10 to text formatting (in Excel on Mac OSX), you get the number 41494.63194.

    When you use:

    as.POSIXct(41494.63194*86400, origin="1899-12-30",tz="GMT")
    

    it will give you:

    [1] "2013-08-08 15:09:59 GMT"
    

    This is 1 second off from the original date (which is also an indication that 41494.63194 is rounded to five decimals).

    Probably the best solution to do is export your excel-file to a .csv or a tab-separated .txt file and then read it into R. This gives me at least the correct dates:

    > df
                datum
    1  8/8/2013 15:10
    2 7/26/2013 10:30
    3 7/11/2013 14:20
    4 3/28/2013 16:15
    5 3/18/2013 15:50
    
    0 讨论(0)
  • 2020-12-10 17:21

    This is how it works over here on a Windows system. This is what a source Excel 2010 file looks like:

    date                num         secs        constant    Rtime
    (mm/dd/yyyy)        (in Excel)  (num*86400) (Windows)   (secs-constant) 
    08/08/2013 15:10    41494.63    3585136200  2209161600  1375974600
    07/26/2013 10:30    41481.44    3583996200  2209161600  1374834600
    11/07/2013 14:20    41585.60    3592995600  2209161600  1383834000
    03/28/2013 16:15    41361.68    3573648900  2209161600  1364487300
    03/18/2013 15:50    41351.66    3572783400  2209161600  1363621800
    
    Rtime <- c(1375974600,1374834600,1383834000,1364487300,1363621800)
    as.POSIXct(Rtime,origin="1970-01-01",tz="GMT")
    #[1] "2013-08-08 15:10:00 GMT" "2013-07-26 10:30:00 GMT"
    #[3] "2013-11-07 14:20:00 GMT" "2013-03-28 16:15:00 GMT"
    #[5] "2013-03-18 15:50:00 GMT"
    

    Why this constant? Firstly, because Excel and Office generally is a mess when dealing with dates. Seriously, look over here: Why is 1899-12-30 the zero date in Access / SQL Server instead of 12/31?

    2209161600 is the difference in seconds between the POSIXct start of 1970-01-01 and 1899-12-30, which is the 0 point in Excel on Windows.

    dput(as.POSIXct(2209161600,origin="1899-12-30",tz="GMT"))
    #structure(0, tzone = "GMT", class = c("POSIXct", "POSIXt"))
    
    0 讨论(0)
  • 2020-12-10 17:22

    Given

    x <- c("8/8/2013 15:10","7/26/2013 10:30","7/11/2013 14:20","3/28/2013 16:15","3/18/2013 15:50")
    

    (which is read as a character vector),

    try

    x <- as.POSIXct(x, format = "%m/%d/%Y %H:%M", tz = "GMT")
    

    It reads correctly as a POSIXct vector to me.

    0 讨论(0)
提交回复
热议问题