Recursive feature elimination on Random Forest using scikit-learn

后端 未结 4 1096
一向
一向 2020-12-28 18:06

I\'m trying to preform recursive feature elimination using scikit-learn and a random forest classifier, with OOB ROC as the method of scoring each subset create

4条回答
  •  没有蜡笔的小新
    2020-12-28 18:44

    Here's what I've done to adapt RandomForestClassifier to work with RFECV:

    class RandomForestClassifierWithCoef(RandomForestClassifier):
        def fit(self, *args, **kwargs):
            super(RandomForestClassifierWithCoef, self).fit(*args, **kwargs)
            self.coef_ = self.feature_importances_
    

    Just using this class does the trick if you use 'accuracy' or 'f1' score. For 'roc_auc', RFECV complains that multiclass format is not supported. Changing it to two-class classification with the code below, the 'roc_auc' scoring works. (Using Python 3.4.1 and scikit-learn 0.15.1)

    y=(pd.Series(iris.target, name='target')==2).astype(int)
    

    Plugging into your code:

    from sklearn import datasets
    import pandas as pd
    from pandas import Series
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.feature_selection import RFECV
    
    class RandomForestClassifierWithCoef(RandomForestClassifier):
        def fit(self, *args, **kwargs):
            super(RandomForestClassifierWithCoef, self).fit(*args, **kwargs)
            self.coef_ = self.feature_importances_
    
    iris = datasets.load_iris()
    x=pd.DataFrame(iris.data, columns=['var1','var2','var3', 'var4'])
    y=(pd.Series(iris.target, name='target')==2).astype(int)
    rf = RandomForestClassifierWithCoef(n_estimators=500, min_samples_leaf=5, n_jobs=-1)
    rfecv = RFECV(estimator=rf, step=1, cv=2, scoring='roc_auc', verbose=2)
    selector=rfecv.fit(x, y)
    

提交回复
热议问题