sklearn.LabelEncoder with never seen before values

后端 未结 12 1014
执笔经年
执笔经年 2020-11-27 10:37

If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set.

The only solution I c

12条回答
  •  青春惊慌失措
    2020-11-27 11:21

    If someone is still looking for it, here is my fix.

    Say you have
    enc_list : list of variables names already encoded
    enc_map : the dictionary containing variables from enc_list and corresponding encoded mapping
    df : dataframe containing values of a variable not present in enc_map

    This will work assuming you already have category "NA" or "Unknown" in the encoded values

    for l in enc_list:  
    
        old_list = enc_map[l].classes_
        new_list = df[l].unique()
        na = [j for j in new_list if j not in old_list]
        df[l] = df[l].replace(na,'NA')
    

提交回复
热议问题