R as.POSIXct() dropping hours minutes and seconds

给你一囗甜甜゛ 提交于 2019-12-06 05:56:19

问题


I am experimenting with R to analyse some measurement data. I have a .csv file containing more than 2 million lines of measurement. Here is an example:

2014-10-22 21:07:03+00:00,7432442.0
2014-10-22 21:07:21+00:00,7432443.0
2014-10-22 21:07:39+00:00,7432444.0
2014-10-22 21:07:57+00:00,7432445.0
2014-10-22 21:08:15+00:00,7432446.0
2014-10-22 21:08:33+00:00,7432447.0
2014-10-22 21:08:52+00:00,7432448.0
2014-10-22 21:09:10+00:00,7432449.0
2014-10-22 21:09:28+00:00,7432450.0

After reading in the file, I want to convert the time to a correct time, using as.POSIXct(). For small files this works fine, but for large files it does not.

I made an example by reading in a big file, creating a copy of a small portion and then unleashing the as.POSIXct() on the correct column. I included an image of the file. As you can see, when applying it to the temp-variable it does correctl keep the hours, minutes and seconds. However, when applying it to the whole file, only the date is stored. (it also takes a LOT of time (more than 2 minutes))

What could cause this anomality? Is it due to some system limits, since I'm running this on my laptop.

Edit

On my Windows 7 device I run R 3.1.3 which results in this error. However, on Ubuntu 14.01, running R 3.0.2, the times are kept for the large files. Just noticed there is a newer version (3.2.0) for Windows, will update and check if the issue persists.


回答1:


You can try the code below.
It will:

  • read datetime type as character instead of factor
  • update by reference

library(data.table)
data <- fread("C:/RData/house2_electricity_main.csv")
data[, V1 := as.POSIXct(V1)]

There was a question recently about usage of fasttime::fastPOSIXct instead of as.POSIXct which can additionally speed up.

As for the title question, having POSIXct you can round it quite freely, e.g. functions year,month,mday...

data[, .SD, by = .(year(V1),month(V1),mday(V1))]



回答2:


Maybe the reason for your problem is that you have dates without time somewhere in your data set. Try the following example:

  library(lubridate)
  dates <- as.character(now() + minutes(1:5))
  dates <- c(dates,"2015-05-10")
  as.POSIXct(dates[1:5])
  as.POSIXct(dates)

It first creates a vector dates containing 6 dates with times and converts them to character. Then I add another date (as a character) that does not contain a time. When you run the two conversions to POSIXct, you'll notice that the times are gone in the result, as soon as you include the date without time.

So there seems to be no date without time in the first few rows of your data, but later there maybe will be. There are most likely many solutions for this problem and I'll just propose one that came to my mind.

The first step is to change your read command, such that the dates are stored as characters instead of factors:

data <- read.csv("C:/RData/house2_electricity_Main.csv",header=FALSE,stringsAsFactors=FALSE)

Then you can try to add the time to all the dates that have none and convert to POSIXct only afterwards:

data$V1 <- ifelse(nchar(data$V1) > 11,data$V1, paste0(data$V1,"00:00:00"))
data$V1 <- as.POSIXct(data$V1)

This worked for my little example above. It is not the most elegant solution and maybe someone has a better idea.




回答3:


I had a similar problem with as.POSIXlt(X) dropping the hour:minute:second information, with X being a vector of POSIXct objects, that happened to have tzone="UTC".

However, as.POSIXlt(X, tz="UTC") kept the hour:minute:second information.



来源:https://stackoverflow.com/questions/30038701/r-as-posixct-dropping-hours-minutes-and-seconds

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!