Impute categorical missing values in scikit-learn

后端 未结 10 1418
清歌不尽
清歌不尽 2020-11-30 16:55

I\'ve got pandas data with some columns of text type. There are some NaN values along with these text columns. What I\'m trying to do is to impute those NaN\'s by skle

10条回答
  •  情话喂你
    2020-11-30 17:45

    This code fills in a series with the most frequent category:

    import pandas as pd
    import numpy as np
    
    # create fake data 
    m = pd.Series(list('abca'))
    m.iloc[1] = np.nan #artificially introduce nan
    
    print('m = ')
    print(m)
    
    #make dummy variables, count and sort descending:
    most_common = pd.get_dummies(m).sum().sort_values(ascending=False).index[0] 
    
    def replace_most_common(x):
        if pd.isnull(x):
            return most_common
        else:
            return x
    
    new_m = m.map(replace_most_common) #apply function to original data
    
    print('new_m = ')
    print(new_m)
    

    Outputs:

    m =
    0      a
    1    NaN
    2      c
    3      a
    dtype: object
    
    new_m =
    0    a
    1    a
    2    c
    3    a
    dtype: object
    

提交回复
热议问题