How to get feature importance in xgboost?

后端未结

关注

 11  1759

I\'m using xgboost to build a model, and try to find the importance of each feature using get_fscore(), but it returns {}

and my train code

相关标签:

11条回答

野的像风

2020-12-13 07:19
Using sklearn API and XGBoost >= 0.81:
```
clf.get_booster().get_score(importance_type="gain")
```
or
```
regr.get_booster().get_score(importance_type="gain")
```
For this to work correctly, when you call regr.fit (or clf.fit), X must be a pandas.DataFrame.
0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2020-12-13 07:25

In case you are using XGBRegressor, try with: model.get_booster().get_score().

That returns the results that you can directly visualize through plot_importance command

0 讨论(0)
发布评论:

提交评论
- 加载中...

不知归路

2020-12-13 07:26

print(model.feature_importances_)

plt.bar(range(len(model.feature_importances_)), model.feature_importances_)

0 讨论(0)

半阙折子戏

2020-12-13 07:29
According to this post there 3 different ways to get feature importance from Xgboost:
- use built-in feature importance,
- use permutation based importance,
- use shap based importance.
Built-in feature importance

Code example:
```
xgb = XGBRegressor(n_estimators=100)
xgb.fit(X_train, y_train)
sorted_idx = xgb.feature_importances_.argsort()
plt.barh(boston.feature_names[sorted_idx], xgb.feature_importances_[sorted_idx])
plt.xlabel("Xgboost Feature Importance")
```
Please be aware of what type of feature importance you are using. There are several types of importance, see the docs. The scikit-learn like API of Xgboost is returning gain importance while get_fscore returns weight type.

Permutation based importance
```
perm_importance = permutation_importance(xgb, X_test, y_test)
sorted_idx = perm_importance.importances_mean.argsort()
plt.barh(boston.feature_names[sorted_idx], perm_importance.importances_mean[sorted_idx])
plt.xlabel("Permutation Importance")
```
This is my preferred way to compute the importance. However, it can fail in case highly colinear features, so be careful! It's using permutation_importance from scikit-learn.

SHAP based importance
```
explainer = shap.TreeExplainer(xgb)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test, plot_type="bar")
```
To use the above code, you need to have shap package installed.

I was running the example analysis on Boston data (house price regression from scikit-learn). Below 3 feature importance:

Built-in importance

Permutation based importance

SHAP importance

All plots are for the same model! As you see, there is a difference in the results. I prefer permutation-based importance because I have a clear picture of which feature impacts the performance of the model (if there is no high collinearity).
0 讨论(0)
发布评论:

提交评论
- 加载中...

被撕碎了的回忆

2020-12-13 07:34

Get the table containing scores and feature names, and then plot it.

feature_important = model.get_booster().get_score(importance_type='weight')
keys = list(feature_important.keys())
values = list(feature_important.values())

data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)
data.plot(kind='barh')

For example:

0 讨论(0)

既然无缘

2020-12-13 07:35
For anyone who comes across this issue while using xgb.XGBRegressor() the workaround I'm using is to keep the data in a pandas.DataFrame() or numpy.array() and not to convert the data to dmatrix(). Also, I had to make sure the gamma parameter is not specified for the XGBRegressor.
```
fit = alg.fit(dtrain[ft_cols].values, dtrain['y'].values)
ft_weights = pd.DataFrame(fit.feature_importances_, columns=['weights'], index=ft_cols)
```
After fitting the regressor fit.feature_importances_ returns an array of weights which I'm assuming is in the same order as the feature columns of the pandas dataframe.

My current setup is Ubuntu 16.04, Anaconda distro, python 3.6, xgboost 0.6, and sklearn 18.1.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页

How to get feature importance in xgboost?

Built-in feature importance

Permutation based importance

SHAP based importance

Built-in importance

Permutation based importance

SHAP importance