Impute categorical missing values in scikit-learn

后端未结
关注
 10  1403
清歌不尽 2020-11-30 16:55
I\'ve got pandas data with some columns of text type. There are some NaN values along with these text columns. What I\'m trying to do is to impute those NaN\'s by skle

      
      
        
          10条回答        

        
                    
            
            
                         
                
              
              
                
                   失恋的感觉
                                             
                
                
                (楼主)
            
              
              
                2020-11-30 17:39
              

            
            
                        
You can use sklearn_pandas.CategoricalImputer for the categorical columns. Details:

First, (from the book Hands-On Machine Learning with Scikit-Learn and TensorFlow) you can have subpipelines for numerical and string/categorical features, where each subpipeline's first transformer is a selector that takes a list of column names (and the full_pipeline.fit_transform() takes a pandas DataFrame):

class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values


You can then combine these sub pipelines with sklearn.pipeline.FeatureUnion, for example:

full_pipeline = FeatureUnion(transformer_list=[
    ("num_pipeline", num_pipeline),
    ("cat_pipeline", cat_pipeline)
])


Now, in the num_pipeline you can simply use sklearn.preprocessing.Imputer(), but in the cat_pipline, you can use CategoricalImputer() from the sklearn_pandas package.

note: sklearn-pandas package can be installed with pip install sklearn-pandas, but it is imported as import sklearn_pandas
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它10个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复