Retrieve Decision Boundary Lines (x,y coordinate format) from SKlearn Decision Tree

后端 未结 3 1596
青春惊慌失措
青春惊慌失措 2021-01-13 08:35

I am trying to create a surface plot on an external visualization platform. I\'m working with the iris data set that is featured on the sklearn decision tree documentation p

3条回答
  •  天命终不由人
    2021-01-13 09:30

    For those interested, I had to recently also implement this for higher dimensional data, code was as follow:

    number_of_leaves = (tree.tree_.children_left == -1).sum()
    features = x.shape[1]
    boundaries = np.zeros([number_of_leaves, features, 2])
    boundaries[:,:,0] = -np.inf
    boundaries[:,:,1] = np.inf
    
    locs = np.where(tree.tree_.children_left == -1)[0]
    
    for k in range(locs.shape[0]):
        idx = locs[k]
        idx_new = idx
    
        while idx_new != 0:
            i_check = np.where(tree.tree_.children_left == idx_new)[0]
            j_check = np.where(tree.tree_.children_right == idx_new)[0]
    
            if i_check.shape[0] == 1:
                idx_new = i_check[0]
                feat_ = tree.tree_.feature[idx_new]
                val_ = tree.tree_.value[idx_new]
                boundaries[k,feat_, 0] = val_
            elif j_check.shape[0] == 1:
                idx_new = j_check[0]
                feat_ = tree.tree_.feature[idx_new]
                val_ = tree.tree_.value[idx_new]
                boundaries[k,feat_, 1] = val_ 
            else: 
                print('Fail Case') # for debugging only - never occurs
    

    Essentially I build up a n*d*2 tensor where n is the number of leaves of the tree, d is the dimensionality of the space and the third dimension holds the min and max values. Leaves are stored in tree.tree_.children_left / tree.tree_.children_right as -1, I then loop backwards to find the branch that caused the split onto the leaf and add the splitting criteria to the decision bounds.

提交回复
热议问题