How to get the features selected by the RandomizedSearchCV for LGBMClassifier model?

蹲街弑〆低调 提交于 2020-03-23 08:17:49

问题


I'm using the RandomizedSearchCV (sklearn) model selection to find out the best fit for a LightGBM LGBMClassifier model, but I'm facing issues to figure out which features has been selected for that. I can print out the the importance of each one by:

lgbm_clf = lgbm.LGBMClassifier(boosting_type='gbdt',....
lgbm_clf.fit(X_train, y_train)
importance_type = lgbm_clf.importance_type
lgbm_clf.importance_type = "gain"
gain = lgbm_clf.feature_importances_
lgbm_clf.importance_type = "split"
split = lgbm_clf.feature_importances_
lgbm_clf.importance_type = importance_type

feature_importance = pd.DataFrame(
    dict(snp=data.columns, zgain=zscore(gain), zsplit=zscore(split))
)
feature_importance

But how do I know which features has been used in the model?

e.g.: If I try:

lgbm.plot_split_value_histogram(lgbm_clf, 1)

I get the error: ValueError: Cannot plot split value histogram, because feature 1 was not used in splitting

This question is part of a broad doubt that has been asked at How to compare feature selection regression-based algorithm with tree-based algorithms?.

Thank you!

来源:https://stackoverflow.com/questions/60681929/how-to-get-the-features-selected-by-the-randomizedsearchcv-for-lgbmclassifier-mo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!