Time-series - data splitting and model evaluation

前端 未结 3 915
粉色の甜心
粉色の甜心 2020-12-07 16:51

I\'ve tried to use machine learning to make prediction based on time-series data. In one of the stackoverflow question (createTimeSlices function in CARET package in R) is a

3条回答
  •  广开言路
    2020-12-07 17:42

    Shambho's answer provides decent example of how to use the caret package with TimeSlices, however, it can be misleading in terms of modelling technique. So in order not to misguide future readers that want to use the caret package for predictive modelling on time-series (and here I do not mean autoregressive models), I want to highlight a few things.

    The problem with time-series data is that look-ahead bias is easy if one is not careful. In this case, the economics data set has aligned data at their economic reporting dates and not their release date, which is never the case in real live applications (economic data points have different time stamps). Unemployment data may be two months behind the other indicators in terms of release date, which would then introduce a model bias in Shambho's example.

    Next, this example is only descriptive statistics and not predictive (forecasting) because the data we want to forecast (unemploy) is not lagged correctly. It merely trains a model to best explain the variation in unemployment (which also in this case is a stationary time-series creating all sorts of issues in modelling process) based on predictor variables at the same economic report dates.

    Lastly, the 12-month horizon in this example is not a true multi-period forecasting as Hyndman does it in his examples.

    Hyndman on cross-validation for time-series

提交回复
热议问题