Grid search on parameters inside the parameters of a BaggingClassifier

别说谁变了你拦得住时间么 提交于 2019-12-24 20:32:13

问题


This is a follow up on a question answered here, but I believe it deserves its own thread.

In the previous question, we were dealing with “an Ensemble of Ensemble classifiers, where each has its own parameters.” Let's start with the example provided by MaximeKan in his answer:

my_est = BaggingClassifier(RandomForestClassifier(n_estimators = 100, bootstrap = True, 
      max_features = 0.5), n_estimators = 5, bootstrap_features = False, bootstrap = False, 
      max_features = 1.0, max_samples = 0.6 )

Now say I want to go one level above that: Considerations like efficiency, computational cost, etc., aside, and as a general concept: How would I ran grid search with this kind of setup?

I can set up two parameter grids along these lines:

One for the BaggingClassifier:

BC_param_grid = {
'bootstrap': [True, False],
'bootstrap_features': [True, False],    
'n_estimators': [5, 10, 15],
'max_samples' : [0.6, 0.8, 1.0]
}

And one for the RandomForestClassifier:

RFC_param_grid = {
'bootstrap': [True, False],    
'n_estimators': [100, 200, 300],
'max_features' : [0.6, 0.8, 1.0]
}

Now I can call grid search with my estimator:

grid_search = GridSearchCV(estimator = my_est, param_grid = ???)

What do I do with the param_grid parameter in this case? Or more specifically, how do I use both of the parameter grids I set up?

I have to say, it feels like I’m playing with matryoshka dolls.


回答1:


Following @James Dellinger comment above, and expanding from there, I was able to get it done. Turns out the "secret sauce" is indeed a mostly-undocumented feature - the __ (double underline) separator (there's some passing reference to it in the Pipeline documentation): it seems that adding the inside/base estimator name, followed by this __ to the name of an inside/base estimator parameter, allows you to create a param_grid which covers parameters for both the outside and inside estimators.

So for the example in the question, the outside estimator is BaggingClassifier and the inside/base estimator is RandomForestClassifier. So what you need to do is, first, to import what needs to be imported:

from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.model_selection import GridSearchCV

followed by the param_grid assignments (in this case, those in example in the question):

param_grid = {
 'bootstrap': [True, False],
 'bootstrap_features': [True, False],    
 'n_estimators': [5, 10, 15],
 'max_samples' : [0.6, 0.8, 1.0],
 'base_estimator__bootstrap': [True, False],    
 'base_estimator__n_estimators': [100, 200, 300],
 'base_estimator__max_features' : [0.6, 0.8, 1.0]
}

And, finally, your grid search:

grid_search=GridSearchCV(BaggingClassifier(base_estimator=RandomForestClassifier()), param_grid=param_grid, cv=5)

And you're off to the races.



来源:https://stackoverflow.com/questions/54543612/grid-search-on-parameters-inside-the-parameters-of-a-baggingclassifier

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!