Trouble finding non-unique index entries in zooreg time series

问题

I have several years of data that I'm trying to work into a zoo object (.csv at Dropbox). I'm given an error once the data is coerced into a zoo object. I cannot find any duplicated in the index.

df <- read.csv(choose.files(default = "", caption = "Select data source", multi = FALSE), na.strings="*")
df <- read.zoo(df, format = "%Y/%m/%d %H:%M", regular = TRUE, row.names = FALSE, col.names = TRUE, index.column = 1)
Warning message:
In zoo(rval3, ix) :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique

I've tried:

sum(duplicated(df$NST_DATI))

But the result is 0.

Thanks for your help!

回答1:

You are using read.zoo(...) incorrectly. According to the documentation:

To process the index, read.zoo calls FUN with the index as the first argument. If FUN is not specified then if there are multiple index columns they are pasted together with a space between each. Using the index column or pasted index column: 1. If tz is specified then the index column is converted to POSIXct. 2. If format is specified then the index column is converted to Date. 3. Otherwise, a heuristic attempts to decide among "numeric", "Date" and "POSIXct". If format and/or tz is specified then they are passed to the conversion function as well.

You are specifying format=... so read.zoo(...) converts everything to Date, not POSIXct. Obviously, there are many, many duplicated dates.

Simplistically, the correct solution is to use:

df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M")
# Error in read.zoo(df, FUN = as.POSIXct, format = "%Y/%m/%d %H:%M") : 
#   index has bad entries at data rows: 507 9243 18147 26883 35619 44355

but as you can see this does not work either. Here the problem is much more subtle. The index is converted using POSIXct, but in the system time zone (which on my system is US Eastern). The referenced rows have timestamps that coincide with the changeover from Standard to DST, so these times do not exist in the US Eastern timezone. If you use:

df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC")

the data imports correctly.

EDIT:

As @G.Grothendieck points out, this would also work, and is simpler:

df <- read.zoo(df, tz="UTC")

You should set tz to whatever timezome is appropriate for the dataset.

来源：https://stackoverflow.com/questions/27361500/trouble-finding-non-unique-index-entries-in-zooreg-time-series

标签

zoo