R gbm handling of missing values

前端 未结 5 1780
刺人心
刺人心 2020-12-29 14:24

Does anyone know how gbm in R handles missing values? I can\'t seem to find any explanation using google.

5条回答
  •  清歌不尽
    2020-12-29 14:33

    To explain what gbm does with missing predictors, let's first visualize a single tree of a gbm object.

    Suppose you have a gbm object mygbm. Using pretty.gbm.tree(mygbm, i.tree=1) you can visualize the first tree on mygbm, e.g.:

      SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight    Prediction
    0       46  1.629728e+01        1         5           9      26.462908   1585 -4.396393e-06
    1       45  1.850000e+01        2         3           4      11.363868    939 -4.370936e-04
    2       -1  2.602236e-04       -1        -1          -1       0.000000    271  2.602236e-04
    3       -1 -7.199873e-04       -1        -1          -1       0.000000    668 -7.199873e-04
    4       -1 -4.370936e-04       -1        -1          -1       0.000000    939 -4.370936e-04
    5       20  0.000000e+00        6         7           8       8.638042    646  6.245552e-04
    6       -1  3.533436e-04       -1        -1          -1       0.000000    483  3.533436e-04
    7       -1  1.428207e-03       -1        -1          -1       0.000000    163  1.428207e-03
    8       -1  6.245552e-04       -1        -1          -1       0.000000    646  6.245552e-04
    9       -1 -4.396393e-06       -1        -1          -1       0.000000   1585 -4.396393e-06
    

    See the gbm documentation for details. Each row corresponds to a node, and the first (unnamed) column is the node number. We see that each node has a left and right node (which are set to -1 in case the node is a leaf). We also see each node has associated a MissingNode.

    To run an observation down the tree, we start at node 0. If an observation has a missing value on SplitVar = 46, then it will be sent down the tree to the node MissingNode = 9. The prediction of the tree for such observation will be SplitCodePred = -4.396393e-06, which is the same prediction the tree had before any split is made to node zero (Prediction = -4.396393e-06 for node zero).

    The procedure is similar for other nodes and split variables.

提交回复
热议问题