Imputation of missing values for categories in pandas

后端未结
关注
 4  551
时光说笑 2020-12-04 18:03
The question is how to fill NaNs with most frequent levels for category column in pandas dataframe?
In R randomForest package there is na.roughfix option : A

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   被撕碎了的回忆
                                             
                
                
                (楼主)
            
              
              
                2020-12-04 18:35
              

            
            
                        
In more recent versions of scikit-learn up you can use SimpleImputer to impute both numerics and categoricals:

import pandas as pd
from sklearn.impute import SimpleImputer
arr = [[1., 'x'], [np.nan, 'y'], [7., 'z'], [7., 'y'], [4., np.nan]]
df1 = pd.DataFrame({'x1': [x[0] for x in arr],
                    'x2': [x[1] for x in arr]},
                  index=[l for l in 'abcde'])
imp = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
print(pd.DataFrame(imp.fit_transform(df1),
                   columns=df1.columns,
                   index=df1.index))
#   x1 x2
# a  1  x
# b  7  y
# c  7  z
# d  7  y
# e  4  y

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复