What does KFold in python exactly do?

前端 未结 3 1642
失恋的感觉
失恋的感觉 2020-12-15 07:35

I am looking at this tutorial: https://www.dataquest.io/mission/74/getting-started-with-kaggle

I got to part 9, making predictions. In there there is some data in a

3条回答
  •  一个人的身影
    2020-12-15 07:55

    Sharing theoretical information about KF that I have learnt so far.

    KFOLD is a model validation technique, where it's not using your pre-trained model. Rather it just use the hyper-parameter and trained a new model with k-1 data set and test the same model on the kth set.

    K different models are just used for validation.

    It will return the K different scores(accuracy percentage), which are based on kth test data set. And we generally take the average to analyse the model.

    We repeat this process with all the different models that we want to analyse. Brief Algo:

    1. Split data in to training and test part.
    2. Trained different models say SVM, RF, LR on this training data.
       2.a Take whole data set and divide in to K-Folds.
       2.b Create a new model with the hyper parameter received after training on step 1.
       2.c Fit the newly created model on K-1 data set.
       2.d Test on Kth data set
       2.e Take average score.
    
    1. Analyse the different average score and select the best model out of SVM, RF and LR.

    Simple reason for doing this, we generally have data deficiencies and if we divide the whole data set into:

    1. Training
    2. Validation
    3. Testing

    We may left out relatively small chunk of data and which may overfit our model. Also possible that some of the data remain untouched for our training and we are not analysing the behavior against such data.

    KF overcome with both the issues.

提交回复
热议问题