how to implement walk forward testing in sklearn?

点点圈 提交于 2019-12-20 08:49:00

问题


In sklearn, GridSearchCV can take a pipeline as a parameter to find the best estimator through cross validation. However, the usual cross validation is like this:

to cross validate a time series data, the training and testing data are often splitted like this:

That is to say, the testing data should be always ahead of training data.

My thought is:

  1. Write my own version class of k-fold and passing it to GridSearchCV so I can enjoy the convenience of pipeline. The problem is that it seems difficult to let GridSearchCV to use an specified indices of training and testing data.

  2. Write a new class GridSearchWalkForwardTest which is similar to GridSearchCV, I am studying the source code grid_search.py and find it is a little complicated.

Any suggestion is welcome.


回答1:


I think you could use a Time Series Split either instead of your own implementation or as a basis for implementing a CV method which is exactly as you describe it.

After digging around a bit, it seems like someone added a max_train_size to the TimeSeriesSplit in this PR which seems like it does what you want.




回答2:


My opinion is that you should try to implement your own GridSearchWalkForwardTest. I used GridSearch once to do the training and implemented the same GridSearch myself and I didn't get the same results, eventhough I should.

What I did at the end is using my own function. You have more control over the training and test set and you have more control over the parameters you train.




回答3:


I did some work regarding all this some months ago.

You could check it out in this question/answer:

Rolling window REVISITED - Adding window rolling quantity as a parameter- Walk Forward Analysis



来源:https://stackoverflow.com/questions/31947183/how-to-implement-walk-forward-testing-in-sklearn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!