问题
I'm using the RandomizedSearchCV (sklearn) model selection to find out the best fit for a LightGBM LGBMClassifier model, but I'm facing issues to figure out which features has been selected for that. I can print out the the importance of each one by:
lgbm_clf = lgbm.LGBMClassifier(boosting_type='gbdt',....
lgbm_clf.fit(X_train, y_train)
importance_type = lgbm_clf.importance_type
lgbm_clf.importance_type = "gain"
gain = lgbm_clf.feature_importances_
lgbm_clf.importance_type = "split"
split = lgbm_clf.feature_importances_
lgbm_clf.importance_type = importance_type
feature_importance = pd.DataFrame(
dict(snp=data.columns, zgain=zscore(gain), zsplit=zscore(split))
)
feature_importance
But how do I know which features has been used in the model?
e.g.: If I try:
lgbm.plot_split_value_histogram(lgbm_clf, 1)
I get the error: ValueError: Cannot plot split value histogram, because feature 1 was not used in splitting
This question is part of a broad doubt that has been asked at How to compare feature selection regression-based algorithm with tree-based algorithms?.
Thank you!
来源:https://stackoverflow.com/questions/60681929/how-to-get-the-features-selected-by-the-randomizedsearchcv-for-lgbmclassifier-mo