For a project I am comparing a number of decision trees, using the regression algorithms (Random Forest, Extra Trees, Adaboost and Bagging) of scikit-learn. To compare and i
I encountered the same problem, and average feature importance was what I was interested in. Furthermore, I needed to have a feature_importance_ attribute exposed by (i.e. accessible from) the bagging classifier object. This was necessary to be used in another scikit-learn algorithm (i.e. RFE with an ROC_AUC scorer).
I chose to overload the BaggingClassifier, to gain a direct access to the mean feature_importance (or "coef_" parameter) of the base estimators.
Here is how to do so:
class BaggingClassifierCoefs(BaggingClassifier):
def __init__(self,**kwargs):
super().__init__(**kwargs)
# add attribute of interest
self.feature_importances_ = None
def fit(self, X, y, sample_weight=None):
# overload fit function to compute feature_importance
fitted = self._fit(X, y, self.max_samples, sample_weight=sample_weight) # hidden fit function
if hasattr(fitted.estimators_[0], 'feature_importances_'):
self.feature_importances_ = np.mean([tree.feature_importances_ for tree in fitted.estimators_], axis=0)
else:
self.feature_importances_ = np.mean([tree.coef_ for tree in fitted.estimators_], axis=0)
return(fitted)