Building a Decision Tree

问题

When building a decision tree, at each node, we select the best feature, and then the best splitting position for that feature. However, when all values for the best feature is 0 for samples in the current node /set, what do I do? All samples keep being grouped to one side (the <= 0 branch), and an infinite loop occurs. For example:

#left: 1500, #right: 0

then,

#left: 1500, #right: 0

and so on...

Just for reference, I'm following the following pseudo-code.

GrowTree(S)
if (y_i = C for all i in S and some class C) then {
 return new leaf(C)                             
 } else {
 choose best splitting feature j and splitting point beta (*)
 I choose the one that gives me the max entropy drop
 S_l = {i : X_ij < beta}                           
 S_r = {i : X_ij >= beta}
 return new node(j, beta, GrowTree(S_l), GrowTree(S_r))

}

回答1:

This is simply impossible. You are supposed to select threshold which leads to the biggest incrase of model certainty. Using threshold which puts every single instance in the same branch gives you 0 increase in models certainty, thus this is not the best split. This should happen if and only if, the impurity/entropy is already 0 in this feature, but then it is a stopping criterion for creating leaves in decision tree.

来源：https://stackoverflow.com/questions/36457849/building-a-decision-tree

标签

python

machine-learning

decision-tree

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!