问题
I am loading a csv into a dataframe using
str <- readLines("Messages.csv", n=-1, skipNul=TRUE)
matches <- str_match(str, pattern = "\\s*([0-9]{2}/[0-9]{2}/[0-9]{4}),\\s*([0-9]{2}:[0-9]{2}:[0-9]{2}),\\s*(Me|Them),\\s*(\\+[0-9]{11,12}),\\s*((?s).*)")
df <- data.frame(matches[, -1], stringsAsFactors=F)
colnames(df) <- c("date","time","sender","phone number","msg")
# Format the date and create a row with the number of characters of the messages
df <- df %>%
mutate(posix.date=parse_date_time(paste0(date,time),"%d%m%y%H%M%S"),tz="Europe/London") %>%
mutate(nb.char = nchar(msg)) %>%
select(posix.date, sender, msg, nb.char) %>%
arrange(as.numeric(posix.date))
I can change sender names using
# Change the senders' names
df <- df %>%
mutate(sender = replace(sender, sender == "Me", "Mr. Awesome"))
But I want to change the time zone for the data from to tz="America/Los_Angeles"
I have tried the follow both without success:
attributes(df)$tz<-"America/Los_Angeles"
this compiles but nothing seems to change
and also this:
df <- df %>%
mutate(date = replace(date, format(date, tz="America/Los_Angeles",usetz=TRUE)))
which gives the error: "Error in eval(expr, envir, enclos) : argument "values" is missing, with no default"
Perhaps I am not specifying the original time zone correctly, but I have no idea really how to check that it went through.
Thanks!
回答1:
First, you can change the time zone of a POSIXct variable. It is not meaningful to "change the time zone in a data.frame", so setting a "tz" attribute of a data.frame does nothing.
[ Note: it is meaningful, however, to change the time zone of an xts object. See this post. ]
I gather that your timestamps are in GMT and you want to convert that to the equivalent in PST. If this is what you are intending, then this should work:
df$posix.date <- as.POSIXct(as.integer(df$posix.date),
origin="1970-01-01",
tz="American/Los_Angeles")
For example:
x <- as.POSIXct("2015-01-01 12:00:00", tz="Europe/London")
x
# [1] "2015-01-01 12:00:00 GMT"
as.POSIXct(as.integer(x),origin="1970-01-01",tz="America/Los_Angeles")
# [1] "2015-01-01 04:00:00 PST"
The issue here is that as.POSIXct(...) works differently depending on the class of the object passed to it. If you pass a character or integer, the time zone is set according to tz=.... If you pass an object that is already POSIXct, the tz=... argument is ignored. So here we convert x to integer so the tz=... argument is respected.
Really convoluted. If there's an easier way I'd love to hear about it.
来源:https://stackoverflow.com/questions/32708528/how-to-change-a-time-zone-in-a-data-frame