Accurately converting from character->POSIXct->character with sub millisecond datetimes

末鹿安然 提交于 2019-11-26 16:37:56

问题


I have a character datetime column in a file. I load the file (into a data.table) and do things that require the column to be converted to POSIXct. I then need to write the POSIXct value back to file, but the datetime will not be the same (because it is printed incorrectly).

This print/formatting issue is well known and has been discussed several times. I've read some posts describing this issue. The most authoritative answers I found are given in response to this question. The answers to that question provide two functions (myformat.POSIXct and form) that are supposed to solve this issue, but they do not seem to work on this example:

x <- "04-Jan-2013 17:22:08.139"
options("digits.secs"=6)
form(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),format="%d-%b-%Y %H:%M:%OS3")
[1] "04-Jan-2013 17:22:08.138"
form(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),format="%d-%b-%Y %H:%M:%OS4")
[1] "04-Jan-2013 17:22:08.1390"
myformat.POSIXct(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),digits=3)
[1] "2013-01-04 17:22:08.138"
myformat.POSIXct(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),digits=4)
[1] "2013-01-04 17:22:08.1390"

My sessionInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                        
[5] LC_TIME=C                              

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] fasttime_1.0-0   data.table_1.8.9 bit64_0.9-2      bit_1.1-9
[5] sas7bdat_0.3     chron_2.3-43     vimcom_0.9-6    

loaded via a namespace (and not attached):
[1] tools_2.15.2

回答1:


Two things:

1) @statquant is right (and the otherwise known experts @Joshua Ulrich and @Dirk Eddelbuettel are wrong), and @Aaron in his comment, but that will not be important for the main question here:

POSIXlt by design is definitely more accurate in storing times than POSIXct: As its seconds are always in [0, 60), it has a granularity of about 6e-15, i.e., 6 femtoseconds which would be dozens of million times less granular than POSIXct.

However, this is not very relevant here (and for current R): Almost all operations, notably numeric ones, use the Ops group method (yes, not known to beginners, but well documented), just look at Ops.POSIXt which indeed trashes the extra precision by first coercing to POSIXct. In addition, the format()/print() ing uses 6 decimals after the "." at most, and hence also does not distinguish between the internally higher precision of POSIXlt and the "only" 100 nanosecond granularity of POSIXct.
(For the above reason, both Dirk and Joshua were lead to their wrong assertion: For all simple practical uses, the precision of *lt and *ct is made the same).

2) I do tend to agree that we (R Core) should improve the format()ing and hence print()ing of such fractions of seconds POSIXt objects (still after the bug fix mentioned by @Aaron above).
But then I may be wrong, and "we" have got it right, by some definition of "right" ;-)




回答2:


As the answers to the questions you linked to already say, how a value is printed/formatted is not the same as what the actual value is. This is just a printed representation issue.

R> as.POSIXct('2011-10-11 07:49:36.3')-as.POSIXlt('2011-10-11 07:49:36.3')
Time difference of 0 secs
R> as.POSIXct('2011-10-11 07:49:36.2')-as.POSIXlt('2011-10-11 07:49:36.3')
Time difference of -0.0999999 secs

Your understanding that POSIXct is less precise than POSIXlt is incorrect. You're also incorrect in saying that you can't include a POSIXlt object as a column in a data.frame.

R> x <- data.frame(date=Sys.time())
R> x$date <- as.POSIXlt(x$date)
R> str(x)
'data.frame':   1 obs. of  1 variable:
 $ date: POSIXlt, format: "2013-03-13 07:38:48"



回答3:


So I guess you do need a little fudge factor added to my suggestion here: https://stackoverflow.com/a/7730759/210673. This seems to work but perhaps might include other bugs; test carefully and think about what it's doing before using for anything important.

myformat.POSIXct <- function(x, digits=0) {
  x2 <- round(unclass(x), digits)
  attributes(x2) <- attributes(x)
  x <- as.POSIXlt(x2)
  x$sec <- round(x$sec, digits) + 10^(-digits-1)
  format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep=""))
}



回答4:


When you write

My understanding is that POSIXct representation is less precise than the POSIXlt representation

you are plain wrong.

It is the same representation for both -- down to milliseconds on Windows, and down to (almost) microseconds on the other OSs. Did you read help(DateTimeClasses) ?

As for your last question, yes the development version of my RcppBDT package uses Boost Date.Time and can go all the way to nanoseconds if your OS supports it and you turned the proper representation on. But it does replace POSIXct, and does not yet support vectors of time objects.

Edit: Regarding your follow-up question:

R> one <- Sys.time(); two <- Sys.time(); two - one
Time difference of 7.43866e-05 secs
R>
R> as.POSIXlt(two) - as.POSIXlt(one)
Time difference of 7.43866e-05 secs
R> 
R> one    # options("digits.sec"=6) on my box
[1] "2013-03-13 07:30:57.757937 CDT"
R> 

Edit 2: I think you are simply experiencing that floating point representation on computers is inexact:

R> print(as.numeric(as.POSIXct("04-Jan-2013 17:22:08.138",
+                   format="%d-%b-%Y %H:%M:%OS")), digits=18)
[1] 1357341728.13800001
R> print(as.numeric(as.POSIXct("04-Jan-2013 17:22:08.139",
+                   format="%d-%b-%Y %H:%M:%OS")), digits=18)
[1] 1357341728.13899994
R> 

The difference is not precisely 1/1000 as you assumed.



来源:https://stackoverflow.com/questions/15383057/accurately-converting-from-character-posixct-character-with-sub-millisecond-da

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!