How to get feature importance in xgboost?

后端 未结 11 1748
情深已故
情深已故 2020-12-13 07:00

I\'m using xgboost to build a model, and try to find the importance of each feature using get_fscore(), but it returns {}

and my train code

相关标签:
11条回答
  • 2020-12-13 07:35

    I don't know how to get values certainly, but there is a good way to plot features importance:

    model = xgb.train(params, d_train, 1000, watchlist)
    fig, ax = plt.subplots(figsize=(12,18))
    xgb.plot_importance(model, max_num_features=50, height=0.8, ax=ax)
    plt.show()
    
    0 讨论(0)
  • 2020-12-13 07:36

    In your code you can get feature importance for each feature in dict form:

    bst.get_score(importance_type='gain')
    
    >>{'ftr_col1': 77.21064539577829,
       'ftr_col2': 10.28690566363971,
       'ftr_col3': 24.225014841466294,
       'ftr_col4': 11.234086283060112}
    

    Explanation: The train() API's method get_score() is defined as:

    get_score(fmap='', importance_type='weight')

    • fmap (str (optional)) – The name of feature map file.
    • importance_type
      • ‘weight’ - the number of times a feature is used to split the data across all trees.
      • ‘gain’ - the average gain across all splits the feature is used in.
      • ‘cover’ - the average coverage across all splits the feature is used in.
      • ‘total_gain’ - the total gain across all splits the feature is used in.
      • ‘total_cover’ - the total coverage across all splits the feature is used in.

    https://xgboost.readthedocs.io/en/latest/python/python_api.html

    0 讨论(0)
  • 2020-12-13 07:36

    Try this

    fscore = clf.best_estimator_.booster().get_fscore()
    
    0 讨论(0)
  • 2020-12-13 07:40

    For feature importance Try this:

    Classification:

    pd.DataFrame(bst.get_fscore().items(), columns=['feature','importance']).sort_values('importance', ascending=False)
    

    Regression:

    xgb.plot_importance(bst)
    
    0 讨论(0)
  • 2020-12-13 07:41

    Build the model from XGboost first

    from xgboost import XGBClassifier, plot_importance
    model = XGBClassifier()
    model.fit(train, label)
    

    this would result in an array. So we can sort it with descending

    sorted_idx = np.argsort(model.feature_importances_)[::-1]
    

    Then, it is time to print all sorted importances and the name of columns together as lists (I assume the data loaded with Pandas)

    for index in sorted_idx:
        print([train.columns[index], model.feature_importances_[index]]) 
    

    Furthermore, we can plot the importances with XGboost built-in function

    plot_importance(model, max_num_features = 15)
    pyplot.show()
    

    use max_num_features in plot_importance to limit the number of features if you want.

    0 讨论(0)
提交回复
热议问题