When using multiple classifiers - How to measure the ensemble's performance? [SciKit Learn]

后端 未结 2 1798
说谎
说谎 2021-02-06 06:28

I have a classification problem (predicting whether a sequence belongs to a class or not), for which I decided to use multiple classification methods, in order to help filter ou

2条回答
  •  暗喜
    暗喜 (楼主)
    2021-02-06 07:02

    To evaluate the performance of the ensemble, simply follow the same approach as you would normally. However, you will want to get the 10 fold data set partitions first, and for each fold, train all of your ensemble on that same fold, measure the accuracy, rinse and repeat with the other folds and then compute the accuracy of the ensemble. So the key difference is to not train the individual algorithms using k fold cross-validation when evaluating the ensemble. The important thing is not to let the ensemble see the test data either directly or by letting one of it's algorithms see the test data.

    Note also that RF and Extra Trees are already ensemble algorithms in their own right.

    An alternative approach (again making sure the ensemble approach) is to take the probabilities and \ or labels output by your classifiers, and feed them into another classifier (say a DT, RF, SVM, or whatever) that produces a prediction by combining the best guesses from these other classifiers. This is termed "Stacking"

提交回复
热议问题