Scikit Decision tree categorical features

↘锁芯ラ 提交于 2019-12-24 11:37:53

问题


There is well-know problem in Tom's Mitchell Machine Learning book to build decision tree based on the following data, where Play ball is the target variable.

The resulting tree is following

I wonder whether it's possible to build this tree with scikit-learn. I found several examples where decision tree can be depicted as

export_graphviz(clf) 
Source(export_graphviz(clf, out_file=None))

However it looks like scikit doesn't work well with categorical data, the data has to be binarized into several columns. So as result, it is impossible to build the tree exactly as in the picture. Is it correct?


回答1:


Yes, it is correct that it is impossible to build such a tree with scikit-learn.

The primary reason is that this is a ternary tree (nodes with up to three children) but scikit-learn implements only binary trees - nodes have exactly two or no children:

cdef class Tree:
    """Array-based representation of a binary decision tree.
...

However, it is possible to get an equivalent binary tree of the form

Outlook == Sunny
    true  => Humidity == High
        true  => no
        false => yes      
    false => Outlook == Overcast
        true  => yes
        false => Wind == Strong
            true  => no
            false => yes 


来源:https://stackoverflow.com/questions/47586562/scikit-decision-tree-categorical-features

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!