How to binarize RandomForest to plot a ROC in python?

心不动则不痛 提交于 2019-12-11 04:15:52

问题


I have 21 classes. I am using RandomForest. I want to plot a ROC curve, so I checked the example in scikit ROC with SVM

The example uses SVM. SVM has parameters like: probability and decision_function_shape which RF does not.

So how can I binarize RandomForest and plot a ROC?

Thank you

EDIT

To create the fake data. So there are 20 features and 21 classes (3 samples for each class).

df = pd.DataFrame(np.random.rand(63, 20))
label = np.arange(len(df)) // 3 + 1 
df['label']=label
df


#TO TRAIN THE MODEL: IT IS A STRATIFIED SHUFFLED SPLIT
clf = make_pipeline(RandomForestClassifier())   
xSSSmean10 = []
for i in range(10):
    sss = StratifiedShuffleSplit(y, 10, test_size=0.1, random_state=i) 
    scoresSSS = cross_validation.cross_val_score(clf, x, y , cv=sss)

    xSSSmean10.append(scoresSSS.mean())
result_list.append(xSSSmean10)
print("") 

回答1:


For multilabel random forest, each of your 21 labels has a binary classification, and you can create a ROC curve for each of the 21 classes. Your y_train should be a matrix of 0 and 1 for each label.

Assume you fit a multilabel random forest from sklearn and called it rf, and have a X_test and y_test after a test train split. You can plot the ROC curve in python for your first label using this:

from sklearn import metrics 
probs = rf.predict_proba(X_test)
fpr, tpr, threshs = metrics.roc_curve(y_test['name_of_your_first_tag'],probs[0][:,1])

Hope this helps. If you provide your code and data I could write this more specifically.



来源:https://stackoverflow.com/questions/44244435/how-to-binarize-randomforest-to-plot-a-roc-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!