Convert categorical data in pandas dataframe

后端未结

关注

 10  1898

I have a dataframe with this type of data (too many columns):

col1        int64
col2        int64
col3        category
col4        category
col5        categ


                      
              相关标签:


      
      
        
          10条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2020-11-27 10:26
              
            
            
                                                                       
This works for me:

pandas.factorize( ['B', 'C', 'D', 'B'] )[0]


Output:

[0, 1, 2, 0]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2020-11-27 10:29
              
            
            
                                                                       
You can do it less code like below :
f = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'),'col3':list('ababb')})

f['col1'] =f['col1'].astype('category').cat.codes
f['col2'] =f['col2'].astype('category').cat.codes
f['col3'] =f['col3'].astype('category').cat.codes

f


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  故里飘歌        
                
              
                            
                2020-11-27 10:32
              
            
            
                                                                       
What I do is, I replace values.
Like this-
df['col'].replace(to_replace=['category_1', 'category_2', 'category_3'], value=[1, 2, 3], inplace=True)

In this way, if the col column has categorical values, they get replaced by the numerical values.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2020-11-27 10:34
              
            
            
                                                                       
If your concern was only that you making a extra column and deleting it later, just dun use a new column at the first place.

dataframe = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'),  'col3':list('ababb')})
dataframe.col3 = pd.Categorical.from_array(dataframe.col3).codes


You are done. Now as Categorical.from_array is deprecated, use Categorical directly

dataframe.col3 = pd.Categorical(dataframe.col3).codes


If you also need the mapping back from index to label, there is even better way for the same

dataframe.col3, mapping_index = pd.Series(dataframe.col3).factorize()


check below 

print(dataframe)
print(mapping_index.get_loc("c"))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复