ValueError: Data is not binary and pos_label is not specified

别来无恙 提交于 2019-12-03 23:23:47

问题


I am trying to calculate roc_auc_score, but I am getting following error.

"ValueError: Data is not binary and pos_label is not specified"

My code snippet is as follows:

import numpy as np
from sklearn.metrics import roc_auc_score
y_scores=np.array([ 0.63, 0.53, 0.36, 0.02, 0.70 ,1 , 0.48, 0.46, 0.57])
y_true=np.array(['0', '1', '0', '0', '1', '1', '1', '1', '1'])
roc_auc_score(y_true, y_scores)

Please tell me what is wrong with it.


回答1:


You only need to change y_trueso it looks like this:

y_true=np.array([0, 1, 0, 0, 1, 1, 1, 1, 1])

Explanation: If you take a look to what roc_auc_score functions does in https://github.com/scikit-learn/scikit-learn/blob/0.15.X/sklearn/metrics/metrics.py you will see that y_true is evaluated as follows:

classes = np.unique(y_true)
if (pos_label is None and not (np.all(classes == [0, 1]) or
 np.all(classes == [-1, 1]) or
 np.all(classes == [0]) or
 np.all(classes == [-1]) or
 np.all(classes == [1]))):
    raise ValueError("Data is not binary and pos_label is not specified")

At the moment of the execution pos_label is None, but as long as your are defining y_true as an array of characters the np.all are always false and as all of them are negated then the if condition is trueand the exception is raised.




回答2:


We have problem in y_true=np.array(['0', '1', '0', '0', '1', '1', '1', '1', '1']) Convert values of y_true to Boolean

y_true= '1' <= y_true
print(y_true) # [False  True False False  True  True  True  True  True]


来源:https://stackoverflow.com/questions/18401112/valueerror-data-is-not-binary-and-pos-label-is-not-specified

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!