I\'m working on a classification problem with unbalanced classes (5% 1\'s). I want to predict the class, not the probability.
In a binary classification problem, is
In case someone visits this thread hoping for ready-to-use function (python 2.7). In this example cutoff is designed to reflect ratio of events to non-events in original dataset df, while y_prob could be the result of .predict_proba method (assuming stratified train/test split).
def predict_with_cutoff(colname, y_prob, df):
n_events = df[colname].values
event_rate = sum(n_events) / float(df.shape[0]) * 100
threshold = np.percentile(y_prob[:, 1], 100 - event_rate)
print "Cutoff/threshold at: " + str(threshold)
y_pred = [1 if x >= threshold else 0 for x in y_prob[:, 1]]
return y_pred
Feel free to criticize/modify. Hope it helps in rare cases when class balancing is out of the question and the dataset itself is highly imbalanced.