R/zoo: handle non-unique index entries but not lose data?

问题

I've a csv file of data points (e.g. financial ticks, experiment recordings, etc.), and my data has duplicate timestamps. Here is code demonstrating the problem:

library(zoo);library(xts)

csv="2011-11-01,50
2011-11-02,49
2011-11-02,48
2011-11-03,47
2011-11-03,46
2011-11-03,45
2011-11-04,44
2011-11-04,43
2011-11-04,42
2011-11-04,41
"

z1=read.zoo(textConnection(csv),sep=',')
w1=to.weekly(z1)
ep=endpoints(z1,"weeks",1)
w1$Volume=period.apply(z1,ep,length)

z2=read.zoo(textConnection(csv),sep=',',aggregate=T)
w2=to.weekly(z2)
ep=endpoints(z2,"weeks",1)
w2$Volume=period.apply(z2,ep,length)

vignette('zoo-faq'), entry 1, tells me aggregate=T gets rid of zoo's annoying warning message. But then the results change:

> w1
           z1.Open z1.High z1.Low z1.Close Volume
2011-11-04      50      50     41       41     10
> w2
           z2.Open z2.High z2.Low z2.Close Volume
2011-11-04      50      50   42.5     42.5      4

Is there another way to get rid of the warning message but still get the same results as w1? (Yes, I know about suppressWarnings(), which is what I was using before, but I hate the idea.) (I was wondering about passing a custom aggregate function to read.zoo that would return OHLCV data for each day... but couldn't even work out if that was even possible.)

回答1:

Just as a simple variant on Dirk's suggestion, this should work

z0 = read.csv( textConnection(csv), sep=',', header=FALSE )
z1 = zoo( z0$V2, as.Date(z0$V1) + (1:nrow(z0))*10^-10 )

回答2:

You need a function to pad the time stamps with an "epsilon" increment to make them different.

I have also written one or two Rcpp-based functions to do that. Times are after all most often POSIXct which is really a float (after you do as.numeric), so just loop over the time stamps, and on equality to the previous one keep adding a small delta of 1.0e-7 which is smaller than what POSIXct itself can represent. Reset the cumulative delta each time you have an actual break.

Edit: Try the make.index.unique() and make.time.unique() functions in the xts package:

R> sametime <- rep(Sys.time(), 3)
R> xts(1:3, order.by=make.time.unique(sametime))
                           [,1]
2011-12-20 06:52:37.547299    1
2011-12-20 06:52:37.547300    2
2011-12-20 06:52:37.547301    3
R>

Edit 2: Here is another example for Date indexed objects:

R> samedate <- rep(Sys.Date(), 5)   # identical dates
R> xts(1:5, order.by=make.time.unique(as.POSIXct(samedate)))
                           [,1]
2011-12-19 18:00:00.000000    1
2011-12-19 18:00:00.000000    2
2011-12-19 18:00:00.000001    3
2011-12-19 18:00:00.000002    4
2011-12-19 18:00:00.000003    5
R> xts(1:5, order.by=as.Date(make.index.unique(as.POSIXct(samedate))))
           [,1]
2011-12-20    1
2011-12-20    2
2011-12-20    3
2011-12-20    4
2011-12-20    5
R>

The first solution switches to POSIXct, which ends up at six hours before midnight as GMT minus six hours is my local timezone. The second example uses a dual conversion away, and back to Date --- which has then been made unique.

来源：https://stackoverflow.com/questions/8570716/r-zoo-handle-non-unique-index-entries-but-not-lose-data

标签

zoo