How are feature_importances in RandomForestClassifier determined?

后端 未结 6 1612
梦毁少年i
梦毁少年i 2020-11-30 16:39

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification resul

6条回答
  •  伪装坚强ぢ
    2020-11-30 17:03

    The usual way to compute the feature importance values of a single tree is as follows:

    1. you initialize an array feature_importances of all zeros with size n_features.

    2. you traverse the tree: for each internal node that splits on feature i you compute the error reduction of that node multiplied by the number of samples that were routed to the node and add this quantity to feature_importances[i].

    The error reduction depends on the impurity criterion that you use (e.g. Gini, Entropy, MSE, ...). Its the impurity of the set of examples that gets routed to the internal node minus the sum of the impurities of the two partitions created by the split.

    Its important that these values are relative to a specific dataset (both error reduction and the number of samples are dataset specific) thus these values cannot be compared between different datasets.

    As far as I know there are alternative ways to compute feature importance values in decision trees. A brief description of the above method can be found in "Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman.

提交回复
热议问题