问题
Hello I plotted a graph using feature_importance from xgboost. However, the graph returns "f-values". I do not know which feature is being represented in the graph. One way I heard about how to solve this is mapping the index of the features within my dataframe to the index of the feature_importance "f-values" and selecting the columns manually. How do I go about in doing this? Also, if there is another way in doing this, help would truly be appreciated:
Here is my code below:
feature_importance = pd.Series(model.booster().get_fscore()).sort_values(ascending=False)
feature_importance.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')
Here is the graph:
print(feature_importance.head())
Output:
f20 320
f22 85
f29 67
f34 38
f81 20
回答1:
i tried a simple example here to see whats up, here is the code i 've written:
import pandas as pd
import xgboost as xgb
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
model = xgb.XGBRegressor()
size = 100
data = pd.DataFrame([], columns=['a','b','c','target'])
data['a'] = np.random.rand(size)
data['b'] = np.random.rand(size)
data['c'] = np.random.rand(size)
data['target'] = np.random.rand(size)*data['a'] + data['b']
model.fit(data.drop('target',1), data.target)
feature_importance = pd.Series(model.booster().get_fscore()).sort_values(ascending=False)
feature_importance.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')
the result is:
as you see the labels are fine.
now, lets pass an array instead of a dataframe:
model.fit(np.array(data.drop('target',1)), data.target)
feature_importance = pd.Series(model.booster().get_fscore()).sort_values(ascending=False)
feature_importance.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')
hence your problem, a np.array has no index/column names by default, therefore xgboost make default feature names (f0, f1, ..., fn)
来源:https://stackoverflow.com/questions/42356533/mapping-the-index-of-the-feat-importances-to-the-index-of-columns-in-a-dataframe