Help Understanding Cross Validation and Decision Trees

后端 未结 6 1347
Happy的楠姐
Happy的楠姐 2020-12-12 16:29

I\'ve been reading up on Decision Trees and Cross Validation, and I understand both concepts. However, I\'m having trouble understanding Cross Validation as it pertains to D

6条回答
  •  南笙
    南笙 (楼主)
    2020-12-12 17:17

    The problem I can't figure out is at the end you'll have k Decision trees that could all be slightly different because they might not split the same way, etc. Which tree do you pick?

    The purpose of cross validation is not to help select a particular instance of the classifier (or decision tree, or whatever automatic learning application) but rather to qualify the model, i.e. to provide metrics such as the average error ratio, the deviation relative to this average etc. which can be useful in asserting the level of precision one can expect from the application. One of the things cross validation can help assert is whether the training data is big enough.

    With regards to selecting a particular tree, you should instead run yet another training on 100% of the training data available, as this typically will produce a better tree. (The downside of the Cross Validation approach is that we need to divide the [typically little] amount of training data into "folds" and as you hint in the question this can lead to trees which are either overfit or underfit for particular data instances).

    In the case of decision tree, I'm not sure what your reference to statistics gathered in the node and used to prune the tree pertains to. Maybe a particular use of cross-validation related techniques?...

提交回复
热议问题