How does sklearn random forest index feature_importances_

微笑、不失礼 提交于 2019-12-03 13:48:18

Feature Importances returns an array where each index corresponds to the estimated feature importance of that feature in the training set. There is no sorting done internally, it is a 1-to-1 correspondence with the features given to it during training.

If you stored your feature names as a numpy array and made sure it is consistent with the features passed to the model, you can take advantage of numpy indexing to do it.

importances = rf.feature_importances_
important_names = feature_names[importances > np.mean(importances)]
print important_names

Here's what I use to print and plot feature importance including the names, not just the values

importances = pd.DataFrame({'feature':X_train.columns,'importance':np.round(clf.feature_importances_,3)})
importances = importances.sort_values('importance',ascending=False).set_index('feature')
print importances
importances.plot.bar()

Full example

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
import numpy as np
import pandas as pd

# set vars
predictors = ['x1','x2']
response = 'y'

X = df[predictors]
y = df[response]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

# run model
clf = RandomForestClassifier(max_features=5)
clf.fit(X_train.values, y_train.values)

#show to plot importances
importances = pd.DataFrame({'feature':X_train.columns,'importance':np.round(clf.feature_importances_,3)})
importances = importances.sort_values('importance',ascending=False).set_index('feature')
print importances
importances.plot.bar()
tulipnl

Get variable explained:

regressor.score(X, y)

Get importance of variable:

importances = regressor.feature_importances_
print(importances)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!