I\'m using xgboost to build a model, and try to find the importance of each feature using get_fscore()
, but it returns {}
and my train code
I don't know how to get values certainly, but there is a good way to plot features importance:
model = xgb.train(params, d_train, 1000, watchlist)
fig, ax = plt.subplots(figsize=(12,18))
xgb.plot_importance(model, max_num_features=50, height=0.8, ax=ax)
plt.show()
In your code you can get feature importance for each feature in dict form:
bst.get_score(importance_type='gain')
>>{'ftr_col1': 77.21064539577829,
'ftr_col2': 10.28690566363971,
'ftr_col3': 24.225014841466294,
'ftr_col4': 11.234086283060112}
Explanation: The train() API's method get_score() is defined as:
get_score(fmap='', importance_type='weight')
https://xgboost.readthedocs.io/en/latest/python/python_api.html
Try this
fscore = clf.best_estimator_.booster().get_fscore()
For feature importance Try this:
Classification:
pd.DataFrame(bst.get_fscore().items(), columns=['feature','importance']).sort_values('importance', ascending=False)
Regression:
xgb.plot_importance(bst)
Build the model from XGboost first
from xgboost import XGBClassifier, plot_importance
model = XGBClassifier()
model.fit(train, label)
this would result in an array. So we can sort it with descending
sorted_idx = np.argsort(model.feature_importances_)[::-1]
Then, it is time to print all sorted importances and the name of columns together as lists (I assume the data loaded with Pandas)
for index in sorted_idx:
print([train.columns[index], model.feature_importances_[index]])
Furthermore, we can plot the importances with XGboost built-in function
plot_importance(model, max_num_features = 15)
pyplot.show()
use max_num_features
in plot_importance
to limit the number of features if you want.