roc_auc_score - Only one class present in y_true

前端 未结 5 1488
忘了有多久
忘了有多久 2020-12-16 14:28

I am doing a k-fold XV on an existing dataframe, and I need to get the AUC score. The problem is - sometimes the test data only contains 0s, and not 1s!

I tried usin

5条回答
  •  旧巷少年郎
    2020-12-16 14:59

    I am facing the same problem now, and using try-catch does not solve my issue. I developed the code below in order to deal with that.

    import pandas as pd
    import numpy as np
    
    class KFold(object):
    
        def __init__(self, folds, random_state=None):
    
            self.folds = folds
    
            self.random_state = random_state
    
        def split(self, x, y):
    
            assert len(x) == len(y), 'x and y should have the same length'
    
            x_, y_ = pd.DataFrame(x), pd.DataFrame(y)
    
            y_ = y_.sample(frac=1, random_state=self.random_state)
    
            x_ = x_.loc[y_.index]
    
            event_index, non_event_index = list(y_[y == 1].index), list(y_[y == 0].index)
    
            assert len(event_index) >= self.folds, 'number of folds should be less than the number of rows in x'
    
            assert len(non_event_index) >= self.folds, 'number of folds should be less than number of rows in y'
    
            indexes = []
    
            #
            #
            #
            step = int(np.ceil(len(non_event_index) / self.folds))
    
            start, end = 0, step
    
            while start < len(non_event_index):
    
                train_fold = set(non_event_index[start:end])
    
                valid_fold = set([k for k in non_event_index if k not in train_fold])
    
                indexes.append([train_fold, valid_fold])
    
                start, end = end, min(step + end, len(non_event_index))
    
    
            #
            #
            #
            step = int(np.ceil(len(event_index) / self.folds))
    
            start, end, i = 0, step, 0
    
            while start < len(event_index):
    
                train_fold = set(event_index[start:end])
    
                valid_fold = set([k for k in event_index if k not in train_fold])
    
                indexes[i][0] = list(indexes[i][0].union(train_fold))
    
                indexes[i][1] = list(indexes[i][1].union(valid_fold))
    
                indexes[i] = tuple(indexes[i])
    
                start, end, i = end, min(step + end, len(event_index)), i + 1
    
            return indexes 
    

    I just wrote that code and I did not tested it exhaustively. It was tested only for binary categories. Hope it be useful yet.

提交回复
热议问题