Using scikit to determine contributions of each feature to a specific class prediction

后端 未结 5 1127
粉色の甜心
粉色の甜心 2020-12-02 13:57

I am using a scikit extra trees classifier:

model = ExtraTreesClassifier(n_estimators=10000, n_jobs=-1, random_state=0)

Once the model is f

5条回答
  •  天命终不由人
    2020-12-02 14:07

    So far I have been checking eli5 and treeinterpreter (both have been mentioned before) and I think eli5 will be the most helpfull, because I think have more options and is more generic and updated.

    Nevertheless after some time I apply eli5 for a particular case and I could not obtained negative contributions for ExtraTreesClassifier researching a little bit more I realised I was obtaining the importance or weight as seen here. Because I was more interested in something like contribution, as mentioned of the title of this questions, I understand some feature could have a negative effect but when measuring the importance the sign is not important, so feature with positive effects and negatives are put together.

    Because I was very interested in the sign I did as follows: 1) obtain the contributions for all cases 2) agreage all the results to be able to distinguish the same. No very elegant solution, probably there is something better out there, I post it here in case it helps.

    I reproduce the same that previous post.

    from sklearn import datasets
    from sklearn.cross_validation import train_test_split
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.ensemble import  (ExtraTreesClassifier, RandomForestClassifier, 
                                  AdaBoostClassifier, GradientBoostingClassifier)
    import eli5
    
    
    iris = datasets.load_iris()  #sample data
    X, y = iris.data, iris.target
    #split into training and test 
    X_train, X_test, y_train, y_test = train_test_split( 
        X, y, test_size=0.33, random_state=0)
    
    # fit the model on the training set
    #model = DecisionTreeClassifier(random_state=0)
    model = ExtraTreesClassifier(n_estimators= 100)
    
    model.fit(X_train,y_train)
    
    
    aux1 = eli5.sklearn.explain_prediction.explain_prediction_tree_classifier(model,X[0], top=X.shape[1])
    
    aux1
    

    Whith output

    The previous results work with one case I want to run all and create an average:

    This is how a datrame with the results looks like:

    aux1 = eli5.sklearn.explain_prediction.explain_prediction_tree_classifier(model,X[0], top=X.shape[0])
    aux1 = eli5.format_as_dataframe(aux1)
    # aux1.index = aux1['feature']
    # del aux1['target']
    aux
    
    
    target  feature weight  value
    0   0     0.340000    1.0
    1   0   x3  0.285764    0.2
    2   0   x2  0.267080    1.4
    3   0   x1  0.058208    3.5
    4   0   x0  0.048949    5.1
    5   1     0.310000    1.0
    6   1   x0  -0.004606   5.1
    7   1   x1  -0.048211   3.5
    8   1   x2  -0.111974   1.4
    9   1   x3  -0.145209   0.2
    10  2     0.350000    1.0
    11  2   x1  -0.009997   3.5
    12  2   x0  -0.044343   5.1
    13  2   x3  -0.140554   0.2
    14  2   x2  -0.155106   1.4
    

    So I create a function to combine previous kind of tables:

    def concat_average_dfs(aux2,aux3):
        # Putting the same index together
    #     I use the try because I want to use this function recursive and 
    #     I could potentially introduce dataframe with those indexes. This
    #     is not the best way.
        try:
            aux2.set_index(['feature', 'target'],inplace = True)
        except:
            pass
        try:
            aux3.set_index(['feature', 'target'],inplace = True)
        except:
            pass
        # Concatenating and creating the meand
        aux = pd.DataFrame(pd.concat([aux2['weight'],aux3['weight']]).groupby(level = [0,1]).mean())
        # Return in order
        #return aux.sort_values(['weight'],ascending = [False],inplace = True)
        return aux
    aux2 = aux1.copy(deep=True)
    aux3 = aux1.copy(deep=True)
    
    concat_average_dfs(aux3,aux2)
    

    So now I only have to use previous function with all the examples I wish. I will take the whole population not only the training set. Check the average effect in all real cases

    for i in range(X.shape[0]):
    
    
        aux1 = eli5.sklearn.explain_prediction.explain_prediction_tree_classifier(model,X\[i\], top=X.shape\[0\])
        aux1 = eli5.format_as_dataframe(aux1)
    
        if 'aux_total'  in locals() and 'aux_total' in  globals():
            aux_total = concat_average_dfs(aux1,aux_total)
        else:
            aux_total = aux1
    

    With result:

    Las table show the average effects of each feature for all my real population.

    Companion notebook in my github.

提交回复
热议问题