I have several variables at annual frequency in R that I would like to include in a regression analysis with other variables available at quarterly frequency. Additionally,
We could manipulate the output of na.spline
to ensure that it averages to the annual values by shifting the 4 quarters' values or shifting the last 3 quarters' values. In the first case we would subtract the mean of the 4 quarters from each quarter and then add the annual value to each quarter. In the second case we subtract the mean of the last 3 quarters from the last 3 quarters and add the annual.
In each case averaging the z_q_adj
values over the four quarters of a year will recover the original annual value.
Here are the two approaches mentioned:
# 1
yr <- format(time(c), "%Y")
c$z_q_adj <- ave(coredata(c$z_q), yr, FUN = function(x) x - mean(x) + x[1])
giving:
> c
z_a z_q z_q_adj
2000-01-01 100 100.0000 95.36604
2000-04-01 NA 103.4434 98.80946
2000-07-01 NA 106.4080 101.77405
2000-10-01 NA 108.6844 104.05046
2001-01-01 110 110.0000 109.39295
2001-04-01 NA 110.5723 109.96527
2001-07-01 NA 110.8719 110.26484
2001-10-01 NA 110.9840 110.37694
2002-01-01 111 111.0000 110.86116
2002-04-01 NA 111.0150 110.87615
2002-07-01 NA 111.1219 110.98311
2002-10-01 NA 111.4184 111.27958
# 2
c$z_q_adj <- ave(coredata(c$z_q), yr, FUN = function(x) c(x[1], x[-1] - mean(x[-1]) +x[1]))
giving:
> c
z_a z_q z_q_adj
2000-01-01 100 100.0000 100.0000
2000-04-01 NA 103.4434 97.2648
2000-07-01 NA 106.4080 100.2294
2000-10-01 NA 108.6844 102.5058
2001-01-01 110 110.0000 110.0000
2001-04-01 NA 110.5723 109.7629
2001-07-01 NA 110.8719 110.0625
2001-10-01 NA 110.9840 110.1746
2002-01-01 111 111.0000 111.0000
2002-04-01 NA 111.0150 110.8299
2002-07-01 NA 111.1219 110.9368
2002-10-01 NA 111.4184 111.2333
ADDED If you want to know whether a series was interpolated or not some approaches are:
add a comment to the series, e.g. comment(c) <- "Originally annual"
, or
use a naming convention, e.g. add _a
to the series name if it was
originally annual: c_a <- c
, or
if it's OK to retain both the c_q
and c_q_adj
columns then for series
that originated from quarterly data the two columns should be the
same and otherwise not, or
keep a column for both the original data and the quarterly data