What's wrong here? accidentally referencing an existing instance instead of making a new one

不问归期 提交于 2019-12-06 08:34:43

The problem is that results['0']['rf'] and results['1']['rf'] are in fact the same object. Therefore, when you fit the pipeline in your loop:

results = dict()
for k in features.keys():
    results[k] = dict()
    for m in classifiers.keys():
        print(len(features[k]))
        results[k][m] = classifiers[m].fit(features[k], 'species', iris)

You are re-fitting an already fit pipeline, losing your previous work.

To remedy this, you need to create a new instance of Classifier every time you fit it. One possible way to do this is to change your classifiers dictionary from one containing Classifier instances to one containing the arguments required to create a Classifier:

classifiers = {
    'rf': (RandomForestClassifier, n_estimators=100, oob_score=True, bootstrap=True),
    'ab': (AdaBoostClassifier, n_estimators=50)
}

Now, in your loop you should use a Python idiom known as "tuple unpacking" to unpack the arguments and create a separate Classifier instance for each combination

for k in features:
    results[k] = dict()
    for m in classifiers:
        print(len(features[k]))
        classifier = Classifier(*classifiers[m])
        results[k][m] = classifier.fit(features[k], 'species', iris)

Note that to iterate over the keys of a dictionary, one can simply write for key in dct:, as opposed to for key in dct.keys().

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!