How to get feature importance in xgboost?

社会主义新天地 提交于 2019-12-18 10:54:12

问题


I'm using xgboost to build a model, and try to find the importance of each feature using get_fscore(), but it returns {}

and my train code is:

dtrain = xgb.DMatrix(X, label=Y)
watchlist = [(dtrain, 'train')]
param = {'max_depth': 6, 'learning_rate': 0.03}
num_round = 200
bst = xgb.train(param, dtrain, num_round, watchlist)

So is there any mistake in my train? How to get feature importance in xgboost?


回答1:


In your code you can get feature importance for each feature in dict form:

bst.get_score(importance_type='gain')

>>{'ftr_col1': 77.21064539577829,
   'ftr_col2': 10.28690566363971,
   'ftr_col3': 24.225014841466294,
   'ftr_col4': 11.234086283060112}

Explanation: The train() API's method get_score() is defined as:

get_score(fmap='', importance_type='weight')

  • fmap (str (optional)) – The name of feature map file.
  • importance_type
    • ‘weight’ - the number of times a feature is used to split the data across all trees.
    • ‘gain’ - the average gain across all splits the feature is used in.
    • ‘cover’ - the average coverage across all splits the feature is used in.
    • ‘total_gain’ - the total gain across all splits the feature is used in.
    • ‘total_cover’ - the total coverage across all splits the feature is used in.

https://xgboost.readthedocs.io/en/latest/python/python_api.html




回答2:


Using sklearn API and XGBoost >= 0.81:

clf.get_booster().get_score(importance_type="gain")

or

regr.get_booster().get_score(importance_type="gain")

For this to work correctly, when you call regr.fit (or clf.fit), X must be a pandas.DataFrame.




回答3:


Try this

fscore = clf.best_estimator_.booster().get_fscore()



回答4:


I don't know how to get values certainly, but there is a good way to plot features importance:

model = xgb.train(params, d_train, 1000, watchlist)
fig, ax = plt.subplots(figsize=(12,18))
xgb.plot_importance(model, max_num_features=50, height=0.8, ax=ax)
plt.show()



回答5:


For feature importance Try this:

Classification:

pd.DataFrame(bst.get_fscore().items(), columns=['feature','importance']).sort_values('importance', ascending=False)

Regression:

xgb.plot_importance(bst)



回答6:


Build the model from XGboost first

from xgboost import XGBClassifier, plot_importance
model = XGBClassifier()
model.fit(train, label)

this would result in an array. So we can sort it with descending

sorted_idx = np.argsort(model.feature_importances_)[::-1]

Then, it is time to print all sorted importances and the name of columns together as lists (I assume the data loaded with Pandas)

for index in sorted_idx:
    print([train.columns[index], model.feature_importances_[index]]) 

Furthermore, we can plot the importances with XGboost built-in function

plot_importance(model, max_num_features = 15)
pyplot.show()

use max_num_features in plot_importance to limit the number of features if you want.




回答7:


For anyone who comes across this issue while using xgb.XGBRegressor() the workaround I'm using is to keep the data in a pandas.DataFrame() or numpy.array() and not to convert the data to dmatrix(). Also, I had to make sure the gamma parameter is not specified for the XGBRegressor.

fit = alg.fit(dtrain[ft_cols].values, dtrain['y'].values)
ft_weights = pd.DataFrame(fit.feature_importances_, columns=['weights'], index=ft_cols)

After fitting the regressor fit.feature_importances_ returns an array of weights which I'm assuming is in the same order as the feature columns of the pandas dataframe.

My current setup is Ubuntu 16.04, Anaconda distro, python 3.6, xgboost 0.6, and sklearn 18.1.




回答8:


Get the table containing scores and feature names, and then plot it.

feature_important = model.get_score(importance_type='weight')
keys = list(feature_important.keys())
values = list(feature_important.values())

data = pd.DataFrame(data=values, index=keys, columns=["score"]).sort_values(by = "score", ascending=False)
data.plot(kind='barh')

For example:




回答9:


print(model.feature_importances_)

plt.bar(range(len(model.feature_importances_)), model.feature_importances_)


来源:https://stackoverflow.com/questions/37627923/how-to-get-feature-importance-in-xgboost

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!