Compute row percentages in pandas DataFrame?

前端未结

关注

 2  530

I have my data in a pandas DataFrame, and it looks like the following:

cat  val1   val2   val3   val4
A    7      10     0      19
B    10     2      1


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2021-01-03 02:34
              
            
            
                                                                       
You can do this using apply:

df[['val1', 'val2', 'val3', 'val4']] = df[['val1', 'val2', 'val3', 'val4']].apply(lambda x: x/x.sum(), axis=1)


>>> df
  cat      val1      val2      val3      val4
0   A  0.194444  0.277778  0.000000  0.527778
1   B  0.370370  0.074074  0.037037  0.518519
2   C  0.119048  0.357143  0.142857  0.380952

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2021-01-03 02:59
              
            
            
                                                                       
div + sum

For a vectorised solution, divide the dataframe along axis=0 by its sum over axis=1. You can use set_index + reset_index to ignore the identifier column.

df = df.set_index('cat')
res = df.div(df.sum(axis=1), axis=0)

print(res.reset_index())

  cat      val1      val2      val3      val4
0   A  0.194444  0.277778  0.000000  0.527778
1   B  0.370370  0.074074  0.037037  0.518519
2   C  0.119048  0.357143  0.142857  0.380952

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复