Frequency tables in pandas (like plyr in R)

后端未结

关注

 4  2228

不思量自难忘° 2020-12-28 18:53

My problem is how to calculate frequencies on multiple variables in pandas . I have from this dataframe :

d1 = pd.DataFrame( {\'StudentID\': [\"x1\", \"x10


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   攒了一身酷
                                             
                
                
                (楼主)
            
              
              
                2020-12-28 19:27
              

            
            
                        
There is another approach that I like to use for similar problems, it uses groupby and unstack:

d1 = pd.DataFrame({'StudentID': ["x1", "x10", "x2","x3", "x4", "x5", "x6",   "x7",     "x8", "x9"],
                   'StudentGender' : ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
                   'ExamenYear': ['2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'],
                   'Exam': ['algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
                   'Participated': ['no','yes','yes','yes','no','yes','yes','yes','yes','yes'],
                   'Passed': ['no','yes','yes','yes','no','yes','yes','yes','no','yes']},
                  columns = ['StudentID', 'StudentGender', 'ExamenYear', 'Exam', 'Participated', 'Passed'])


(this is just the raw data from above)

d2 = d1.groupby("ExamenYear").Participated.value_counts().unstack(fill_value=0)['yes']
d3 = d1.groupby("ExamenYear").Passed.value_counts().unstack(fill_value=0)['yes']
d2.name = "Participated"
d3.name = "Passed"

pd.DataFrame(data=[d2,d3]).T
            Participated  Passed
ExamenYear                      
2007                   2       2
2008                   3       3
2009                   3       2


This solution is slightly more cumbersome than the one above using apply, but this one is easier to understand and extend, I feel.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复