How to get feature importance in xgboost?

后端未结

关注

 11  1774

情深已故 2020-12-13 07:00

I\'m using xgboost to build a model, and try to find the importance of each feature using get_fscore(), but it returns {}

and my train code

11条回答

半阙折子戏 (楼主)

2020-12-13 07:29
According to this post there 3 different ways to get feature importance from Xgboost:
- use built-in feature importance,
- use permutation based importance,
- use shap based importance.
Built-in feature importance

Code example:
```
xgb = XGBRegressor(n_estimators=100)
xgb.fit(X_train, y_train)
sorted_idx = xgb.feature_importances_.argsort()
plt.barh(boston.feature_names[sorted_idx], xgb.feature_importances_[sorted_idx])
plt.xlabel("Xgboost Feature Importance")
```
Please be aware of what type of feature importance you are using. There are several types of importance, see the docs. The scikit-learn like API of Xgboost is returning gain importance while get_fscore returns weight type.

Permutation based importance
```
perm_importance = permutation_importance(xgb, X_test, y_test)
sorted_idx = perm_importance.importances_mean.argsort()
plt.barh(boston.feature_names[sorted_idx], perm_importance.importances_mean[sorted_idx])
plt.xlabel("Permutation Importance")
```
This is my preferred way to compute the importance. However, it can fail in case highly colinear features, so be careful! It's using permutation_importance from scikit-learn.

SHAP based importance
```
explainer = shap.TreeExplainer(xgb)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test, plot_type="bar")
```
To use the above code, you need to have shap package installed.

I was running the example analysis on Boston data (house price regression from scikit-learn). Below 3 feature importance:

Built-in importance

Permutation based importance

SHAP importance

All plots are for the same model! As you see, there is a difference in the results. I prefer permutation-based importance because I have a clear picture of which feature impacts the performance of the model (if there is no high collinearity).
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...

How to get feature importance in xgboost?

Built-in feature importance

Permutation based importance

SHAP based importance

Built-in importance

Permutation based importance

SHAP importance