发表新帖

发表新帖

scikit-learn .predict() default threshold

后端未结

关注

 5  1519

长发绾君心 2020-12-02 05:00

I\'m working on a classification problem with unbalanced classes (5% 1\'s). I want to predict the class, not the probability.

In a binary classification problem, is

5条回答

心在旅途 (楼主)

2020-12-02 05:37
In case someone visits this thread hoping for ready-to-use function (python 2.7). In this example cutoff is designed to reflect ratio of events to non-events in original dataset df, while y_prob could be the result of .predict_proba method (assuming stratified train/test split).
```
def predict_with_cutoff(colname, y_prob, df):
    n_events = df[colname].values
    event_rate = sum(n_events) / float(df.shape[0]) * 100
    threshold = np.percentile(y_prob[:, 1], 100 - event_rate)
    print "Cutoff/threshold at: " + str(threshold)
    y_pred = [1 if x >= threshold else 0 for x in y_prob[:, 1]]
    return y_pred
```
Feel free to criticize/modify. Hope it helps in rare cases when class balancing is out of the question and the dataset itself is highly imbalanced.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题