Convert from annual to quarterly data, constrained to annual average

孤人 提交于 2019-12-01 06:50:21

A bit late here, but the tempdisagg package does what you want. It ensures that either the sum, the average, the first or the last value of the resulting high frequency series is consistent with the low frequency series.

It also allows you to use external indicator series, e.g., by the Chow-Lin technique. If you don't have it, the Denton-Cholette method produces a better result than the method in Eviews.

Here's your example:

# need ts object as input
z_a <- ts(c(100, 110, 111), start = 2000)

library(tempdisagg)
z_q <- predict(td(z_a ~ 1, method = "denton-cholette", conversion = "average"))

z_q
#           Qtr1      Qtr2      Qtr3      Qtr4
# 2000  97.65795  98.59477 100.46841 103.27887
# 2001 107.02614 109.71460 111.34423 111.91503
# 2002 111.42702 111.06100 110.81699 110.69499

# which has the same means as your original series:

tapply(z_q, floor(time(z_q)), mean)
# 2000 2001 2002 
#  100  110  111 

Perhaps I'm missing something here, but assuming the annual value always comes from the first quarter, couldn't you just replace mean in your aggregate call with min?

 > d <- aggregate(c, as.integer(format(index(c),"%Y")), min, na.rm=TRUE)
 > d
      z_a z_q
 2000 100 100
 2001 110 110
 2002 111 111

We could manipulate the output of na.spline to ensure that it averages to the annual values by shifting the 4 quarters' values or shifting the last 3 quarters' values. In the first case we would subtract the mean of the 4 quarters from each quarter and then add the annual value to each quarter. In the second case we subtract the mean of the last 3 quarters from the last 3 quarters and add the annual.

In each case averaging the z_q_adj values over the four quarters of a year will recover the original annual value.

Here are the two approaches mentioned:

# 1
yr <- format(time(c), "%Y")
c$z_q_adj <- ave(coredata(c$z_q), yr, FUN = function(x) x - mean(x) + x[1])

giving:

> c
           z_a      z_q   z_q_adj
2000-01-01 100 100.0000  95.36604
2000-04-01  NA 103.4434  98.80946
2000-07-01  NA 106.4080 101.77405
2000-10-01  NA 108.6844 104.05046
2001-01-01 110 110.0000 109.39295
2001-04-01  NA 110.5723 109.96527
2001-07-01  NA 110.8719 110.26484
2001-10-01  NA 110.9840 110.37694
2002-01-01 111 111.0000 110.86116
2002-04-01  NA 111.0150 110.87615
2002-07-01  NA 111.1219 110.98311
2002-10-01  NA 111.4184 111.27958


# 2
c$z_q_adj <- ave(coredata(c$z_q), yr, FUN = function(x) c(x[1], x[-1] - mean(x[-1]) +x[1]))

giving:

> c
           z_a      z_q  z_q_adj
2000-01-01 100 100.0000 100.0000
2000-04-01  NA 103.4434  97.2648
2000-07-01  NA 106.4080 100.2294
2000-10-01  NA 108.6844 102.5058
2001-01-01 110 110.0000 110.0000
2001-04-01  NA 110.5723 109.7629
2001-07-01  NA 110.8719 110.0625
2001-10-01  NA 110.9840 110.1746
2002-01-01 111 111.0000 111.0000
2002-04-01  NA 111.0150 110.8299
2002-07-01  NA 111.1219 110.9368
2002-10-01  NA 111.4184 111.2333

ADDED If you want to know whether a series was interpolated or not some approaches are:

  • add a comment to the series, e.g. comment(c) <- "Originally annual", or

  • use a naming convention, e.g. add _a to the series name if it was originally annual: c_a <- c, or

  • if it's OK to retain both the c_q and c_q_adj columns then for series that originated from quarterly data the two columns should be the same and otherwise not, or

  • keep a column for both the original data and the quarterly data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!