Imputation of missing values for categories in pandas

后端未结
关注
 4  549
时光说笑 2020-12-04 18:03
The question is how to fill NaNs with most frequent levels for category column in pandas dataframe?
In R randomForest package there is na.roughfix option : A

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   青春惊慌失措
                                             
                
                
                (楼主)
            
              
              
                2020-12-04 18:46
              

            
            
                        
Most of the time, you wouldn't want the same imputing strategy for all the columns.  For example, you may want column mode for categorical variables and column mean or median for numeric columns.
For example:
df = pd.DataFrame({'num': [1.,2.,4.,np.nan],'cate1':['a','a','b',np.nan],'cate2':['a','b','b',np.nan]})

# numeric columns
>>> df.fillna(df.select_dtypes(include='number').mean().iloc[0], inplace=True)

# categorical columns
>>> df.fillna(df.select_dtypes(include='object').mode().iloc[0], inplace=True)

>>> print(df)

     num cate1 cate2
 0 1.000     a     a
 1 2.000     a     b
 2 4.000     b     b
 3 2.333     a     b

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复