How to calculate feature importance in each models of cross validation in sklearn

喜欢而已 提交于 2019-12-02 02:30:20

cross_val_score() does not return the estimators for each combination of train-test folds.

You need to use cross_validate() and set return_estimator =True.

Here is an working example:

from sklearn import datasets
from sklearn.model_selection import cross_validate
from sklearn.svm import LinearSVC
from sklearn.ensemble import  RandomForestClassifier
import pandas as pd

diabetes = datasets.load_diabetes()
X, y = diabetes.data, diabetes.target

clf=RandomForestClassifier(n_estimators =10, random_state = 42, class_weight="balanced")
output = cross_validate(clf, X, y, cv=2, scoring = 'accuracy', return_estimator =True)
for idx,estimator in enumerate(output['estimator']):
    print("Features sorted by their score for estimator {}:".format(idx))
    feature_importances = pd.DataFrame(estimator.feature_importances_,
                                       index = diabetes.feature_names,
                                        columns=['importance']).sort_values('importance', ascending=False)
    print(feature_importances)

Output:

Features sorted by their score for estimator 0:
     importance
s6     0.137735
age    0.130152
s5     0.114561
s2     0.113683
s3     0.112952
bmi    0.111057
bp     0.108682
s1     0.090763
s4     0.056805
sex    0.023609
Features sorted by their score for estimator 1:
     importance
age    0.129671
bmi    0.125706
s2     0.125304
s1     0.113903
bp     0.111979
s6     0.110505
s5     0.106099
s3     0.098392
s4     0.054542
sex    0.023900
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!