How do I avoid time leakage in my KNN model?

心已入冬 提交于 2019-12-04 11:24:23

In caret, createTimeSlices implements a variation of cross-validation adapted to time series (avoiding time leakage by rolling the forecasting origin). Documentation is here.

In your case, depending on your precise needs, you could use something like this for a proper cross-validation:

your_data <- your_data %>% arrange(close_date)

tr_ctrl <- createTimeSlices(
  your_data$close_price, 
  initialWindow  = 10, 
  horizon = 1,
  fixedWindow = FALSE)

model <- train(
  close_price~ ., data = your_data, method = "knn",
  trControl = tr_ctrl,
  preProcess = c("center", "scale"),
  tuneLength = 10
)

EDIT: if you have ties in the dates and want to having deals closed on the same day in the test and train sets, you can fix tr_ctrl before using it in train:

filter_train <- function(i_tr, i_te) {
  d_tr <- as_date(your_data$close_date[i_tr]) #using package lubridate
  d_te <- as_date(your_data$close_date[i_te])
  tr_is_ok <- d_tr < min(d_te)

  i_tr[tr_is_ok]
}

tr_ctrl$train <- mapply(filter_train, tr_ctrl$train, tr_ctrl$test)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!