I am using a scikit extra trees classifier:
model = ExtraTreesClassifier(n_estimators=10000, n_jobs=-1, random_state=0)
Once the model is f
So far I have been checking eli5 and treeinterpreter (both have been mentioned before) and I think eli5 will be the most helpfull, because I think have more options and is more generic and updated.
Nevertheless after some time I apply eli5 for a particular case and I could not obtained negative contributions for ExtraTreesClassifier researching a little bit more I realised I was obtaining the importance or weight as seen here. Because I was more interested in something like contribution, as mentioned of the title of this questions, I understand some feature could have a negative effect but when measuring the importance the sign is not important, so feature with positive effects and negatives are put together.
Because I was very interested in the sign I did as follows: 1) obtain the contributions for all cases 2) agreage all the results to be able to distinguish the same. No very elegant solution, probably there is something better out there, I post it here in case it helps.
I reproduce the same that previous post.
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (ExtraTreesClassifier, RandomForestClassifier,
AdaBoostClassifier, GradientBoostingClassifier)
import eli5
iris = datasets.load_iris() #sample data
X, y = iris.data, iris.target
#split into training and test
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=0)
# fit the model on the training set
#model = DecisionTreeClassifier(random_state=0)
model = ExtraTreesClassifier(n_estimators= 100)
model.fit(X_train,y_train)
aux1 = eli5.sklearn.explain_prediction.explain_prediction_tree_classifier(model,X[0], top=X.shape[1])
aux1
Whith output
The previous results work with one case I want to run all and create an average:
This is how a datrame with the results looks like:
aux1 = eli5.sklearn.explain_prediction.explain_prediction_tree_classifier(model,X[0], top=X.shape[0])
aux1 = eli5.format_as_dataframe(aux1)
# aux1.index = aux1['feature']
# del aux1['target']
aux
target feature weight value
0 0 0.340000 1.0
1 0 x3 0.285764 0.2
2 0 x2 0.267080 1.4
3 0 x1 0.058208 3.5
4 0 x0 0.048949 5.1
5 1 0.310000 1.0
6 1 x0 -0.004606 5.1
7 1 x1 -0.048211 3.5
8 1 x2 -0.111974 1.4
9 1 x3 -0.145209 0.2
10 2 0.350000 1.0
11 2 x1 -0.009997 3.5
12 2 x0 -0.044343 5.1
13 2 x3 -0.140554 0.2
14 2 x2 -0.155106 1.4
So I create a function to combine previous kind of tables:
def concat_average_dfs(aux2,aux3):
# Putting the same index together
# I use the try because I want to use this function recursive and
# I could potentially introduce dataframe with those indexes. This
# is not the best way.
try:
aux2.set_index(['feature', 'target'],inplace = True)
except:
pass
try:
aux3.set_index(['feature', 'target'],inplace = True)
except:
pass
# Concatenating and creating the meand
aux = pd.DataFrame(pd.concat([aux2['weight'],aux3['weight']]).groupby(level = [0,1]).mean())
# Return in order
#return aux.sort_values(['weight'],ascending = [False],inplace = True)
return aux
aux2 = aux1.copy(deep=True)
aux3 = aux1.copy(deep=True)
concat_average_dfs(aux3,aux2)
So now I only have to use previous function with all the examples I wish. I will take the whole population not only the training set. Check the average effect in all real cases
for i in range(X.shape[0]):
aux1 = eli5.sklearn.explain_prediction.explain_prediction_tree_classifier(model,X\[i\], top=X.shape\[0\])
aux1 = eli5.format_as_dataframe(aux1)
if 'aux_total' in locals() and 'aux_total' in globals():
aux_total = concat_average_dfs(aux1,aux_total)
else:
aux_total = aux1
With result:
Las table show the average effects of each feature for all my real population.
Companion notebook in my github.