R: time series with duplicate time index entries

十年热恋 提交于 2019-12-13 12:30:11

问题


I am a n00b at R and a n00b at stack overflow (just joined), so forgive me if I have failed to use markup (which I don't know) or missed something in the readme.

If you don't mind, I will go through my full problem here as perhaps you might be kind enough to shed some insight into how I should best go about this!

Stage 1
Construction of individual time-series objects for each TS1 Please find a data example below. Essentially, I am loading a csv file with multiple, irregular time-series in it (example TS1, TS2) below, so in an ideal world, I would split these into individual, irregular time-series objects (e.g. zoo?), so TS1, TS2, ... this problem was discussed here (R/zoo: handle non-unique index entries but not lose data?) but I have tried repeatedly to use this approach, and failed.

 Date TS Data 
 21/05/2014 TS1 0.95  
 17/04/2014 TS1 1.02   
 27/03/2014 TS1 0.90   
 30/01/2014 TS1 0.80   
 12/12/2013 TS1 0.70  
 18/09/2013 TS1 0.67  
 01/11/2012 TS1 0.71  
 01/11/2012 TS1 0.70  
 21/05/2014 TS2 0.47  
 20/05/2014 TS2 0.51  
 16/05/2014 TS2 0.49  
 15/05/2014 TS2 0.55  
 10/05/2014 TS2 0.63  
 07/05/2014 TS2 0.77  

as can be seen, the problem arises due to duplicate date index of 01/11/2012 for TS1 which causes read.zoo not to create my split data object.

Stage 2
What I would then like to do is, on every irregular date, add all the data as of that date together. Since all the time-series are irregular, and with different regularity, I would like to use the prior value for a TS. E.g. for 21/05/2014, this calculation in the example is straightforward as both TS1 and 2 have an entry, so the answer would be 0.47 + 0.95. But for 20/05, only TS2 has an entry, so the value for TS1 that should be used is the most recent one as of that date, i.e. the 17/04/2014 value of 1.02, so the calculation for 20/05/2014 should be 0.51 + 1.02. It could be that the simplest way of achieving this might be to convert each TS into a daily value, such that the previous value is used until a new data point? but this is wasteful/unnecessary for stage 3 below.

Stage 3
Having created this aggregated data sum of all the TS', I want to do a polynomial curve-fit. I also want to differentiate this curve-fit to find the rate-of-change as of today's date predicated by this fitted curve.

Any help would be much appreciated! I feel that repeatedly hitting my head against a wall would be more fun than doing anything more at this stage!!

Thanks

Updated: I now have code as follows thanks to Grothendieck.

library(scales)  
library(zoo)  
library(ggplot2)  

f <- function (z) {  
zz <- read.zoo(z, header = TRUE, split = 2, format = "%d/%m/%Y", aggregate = mean);  
z.fill <- na.locf(zz);  
z.fill <- (z.fill >= 0.5) * z.fill;  
z.fill <- na.fill(z.fill,0);  
zfill.mat = matrix(z.fill, NROW(z.fill));  
z.sum <- rowSums(zfill.mat);  
zsum <- zoo(z.sum,time(z.fill));  
return(zsum);  
}  

DF <- read.csv(file.choose(), header = TRUE, as.is = TRUE);  
DF.S <- split(DF[-2], DF[[2]]);  
user <- DF[1,2];  
Ret <- lapply(DF.S,  f);  

I a remaining problem:
Ret contains a list of a data frame. I can access this by typing Ret$user, but since user varies, I need to make this dynamic. I have tried to construct a dynamic expression e.g.:
x <- paste("Ret$'",user,"'",sep = "");
plot(x)

but could not get this to evaluate.


回答1:


read.zoo has an aggregate= argument which takes a function that is used to aggregate the values at duplicate times in the same series. Here we take the mean of duplicate days within series but you could use sum or any other function. (If the data were coming from a file we would replace text = Lines argument in read.zoo with something like "myfile.dat".) Then we use na.locf to fill in the NAs, sum the rows and we use na.omit to drop any leading NAs giving zsum. Next we compute a regularly spaced time grid g and a spline function splfun evaluating that function and its derivative on the grid which, after converting back to zoo, give zspl and zder. Finally we plot them.

Lines <- "Date TS Data 
 21/05/2014 TS1 0.95  
 17/04/2014 TS1 1.02   
 27/03/2014 TS1 0.90   
 30/01/2014 TS1 0.80   
 12/12/2013 TS1 0.70  
 18/09/2013 TS1 0.67  
 01/11/2012 TS1 0.71  
 01/11/2012 TS1 0.70  
 21/05/2014 TS2 0.47  
 20/05/2014 TS2 0.51  
 16/05/2014 TS2 0.49  
 15/05/2014 TS2 0.55  
 10/05/2014 TS2 0.63  
 07/05/2014 TS2 0.77"

library(zoo)

z <- read.zoo(text = Lines, header = TRUE, split = 2, format = "%d/%m/%Y",
       aggregate = mean)
zsum <- na.omit(zoo(rowSums(na.locf(z)), time(z)))

g <- seq(start(zsum), end(zsum), "day")
splfun <- splinefun(time(zsum), coredata(zsum))
zspl <- zoo(splfun(g), g)
zder <- zoo(splfun(g, deriv = 1), g)

plot(merge(zspl, zder))



来源:https://stackoverflow.com/questions/25812673/r-time-series-with-duplicate-time-index-entries

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!