问题
I am a n00b at R and a n00b at stack overflow (just joined), so forgive me if I have failed to use markup (which I don't know) or missed something in the readme.
If you don't mind, I will go through my full problem here as perhaps you might be kind enough to shed some insight into how I should best go about this!
Stage 1
Construction of individual time-series objects for each TS1 Please find a data example below. Essentially, I am loading a csv file with multiple, irregular time-series in it (example TS1, TS2) below, so in an ideal world, I would split these into individual, irregular time-series objects (e.g. zoo?), so TS1, TS2, ... this problem was discussed here (R/zoo: handle non-unique index entries but not lose data?) but I have tried repeatedly to use this approach, and failed.
Date TS Data
21/05/2014 TS1 0.95
17/04/2014 TS1 1.02
27/03/2014 TS1 0.90
30/01/2014 TS1 0.80
12/12/2013 TS1 0.70
18/09/2013 TS1 0.67
01/11/2012 TS1 0.71
01/11/2012 TS1 0.70
21/05/2014 TS2 0.47
20/05/2014 TS2 0.51
16/05/2014 TS2 0.49
15/05/2014 TS2 0.55
10/05/2014 TS2 0.63
07/05/2014 TS2 0.77
as can be seen, the problem arises due to duplicate date index of 01/11/2012
for TS1 which causes read.zoo
not to create my split data object.
Stage 2
What I would then like to do is, on every irregular date, add all the data as of that date together. Since all the time-series are irregular, and with different regularity, I would like to use the prior value for a TS
. E.g. for 21/05/2014
, this calculation in the example is straightforward as both TS1 and 2 have an entry, so the answer would be 0.47 + 0.95
. But for 20/05
, only TS2
has an entry, so the value for TS1
that should be used is the most recent one as of that date, i.e. the 17/04/2014
value of 1.02
, so the calculation for 20/05/2014
should be 0.51 + 1.02
. It could be that the simplest way of achieving this might be to convert each TS into a daily value, such that the previous value is used until a new data point? but this is wasteful/unnecessary for stage 3 below.
Stage 3
Having created this aggregated data sum of all the TS', I want to do a polynomial curve-fit. I also want to differentiate this curve-fit to find the rate-of-change as of today's date predicated by this fitted curve.
Any help would be much appreciated! I feel that repeatedly hitting my head against a wall would be more fun than doing anything more at this stage!!
Thanks
Updated: I now have code as follows thanks to Grothendieck.
library(scales)
library(zoo)
library(ggplot2)
f <- function (z) {
zz <- read.zoo(z, header = TRUE, split = 2, format = "%d/%m/%Y", aggregate = mean);
z.fill <- na.locf(zz);
z.fill <- (z.fill >= 0.5) * z.fill;
z.fill <- na.fill(z.fill,0);
zfill.mat = matrix(z.fill, NROW(z.fill));
z.sum <- rowSums(zfill.mat);
zsum <- zoo(z.sum,time(z.fill));
return(zsum);
}
DF <- read.csv(file.choose(), header = TRUE, as.is = TRUE);
DF.S <- split(DF[-2], DF[[2]]);
user <- DF[1,2];
Ret <- lapply(DF.S, f);
I a remaining problem:
Ret contains a list of a data frame. I can access this by typing Ret$user, but since user varies, I need to make this dynamic. I have tried to construct a dynamic expression e.g.:
x <- paste("Ret$'",user,"'",sep = "");
plot(x)
but could not get this to evaluate.
回答1:
read.zoo
has an aggregate=
argument which takes a function that is used to aggregate the values at duplicate times in the same series. Here we take the mean
of duplicate days within series but you could use sum
or any other function. (If the data were coming from a file we would replace text = Lines
argument in read.zoo
with something like "myfile.dat"
.) Then we use na.locf
to fill in the NAs, sum the rows and we use na.omit
to drop any leading NAs giving zsum
. Next we compute a regularly spaced time grid g
and a spline function splfun
evaluating that function and its derivative on the grid which, after converting back to zoo, give zspl
and zder
. Finally we plot them.
Lines <- "Date TS Data
21/05/2014 TS1 0.95
17/04/2014 TS1 1.02
27/03/2014 TS1 0.90
30/01/2014 TS1 0.80
12/12/2013 TS1 0.70
18/09/2013 TS1 0.67
01/11/2012 TS1 0.71
01/11/2012 TS1 0.70
21/05/2014 TS2 0.47
20/05/2014 TS2 0.51
16/05/2014 TS2 0.49
15/05/2014 TS2 0.55
10/05/2014 TS2 0.63
07/05/2014 TS2 0.77"
library(zoo)
z <- read.zoo(text = Lines, header = TRUE, split = 2, format = "%d/%m/%Y",
aggregate = mean)
zsum <- na.omit(zoo(rowSums(na.locf(z)), time(z)))
g <- seq(start(zsum), end(zsum), "day")
splfun <- splinefun(time(zsum), coredata(zsum))
zspl <- zoo(splfun(g), g)
zder <- zoo(splfun(g, deriv = 1), g)
plot(merge(zspl, zder))

来源:https://stackoverflow.com/questions/25812673/r-time-series-with-duplicate-time-index-entries