Calculate percentages of a binary variable BY another variable in R

后端未结

关注

 4  1679

I want to summarise the percentage of people that have been treated BY region.

I have created a dummy dataset for this purpose:

id <- seq(1:1000)


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2020-12-12 02:15
              
            
            
                                                                       
You could also use data.table: 

library(data.table)

setDT(d)[,.(.N,prop=sum(treatment==2)/.N),
         by=region]
   region   N prop
1:      A 200  0.5
2:      B 200  0.5
3:      C 200  0.5
4:      D 200  0.5
5:      E 200  0.5

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遥遥无期        
                
              
                            
                2020-12-12 02:18
              
            
            
                                                                       
For completeness, here's how you can do it using ddply() from plyr:

library(plyr)
ddply(d[!is.na(d$id),],.(region),summarize,
      N = length(region),
      prop=mean(treatment==1))
#   region   N prop
# 1      A 200  0.5
# 2      B 200  0.5
# 3      C 200  0.5
# 4      D 200  0.5
# 5      E 200  0.5


This assumes that you want to deal with the NA values in id by removing the observation.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2020-12-12 02:19
              
            
            
                                                                       
A dplyr solution:

library(dplyr)
d %>% group_by(region) %>% summarize(NumPat=n(),prop=sum(treatment==1)/n())


What we do here is group by region and then pipe it to summarize by the number of patients in each group, and then calculate the proportion of those patients that received treatment 1.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2020-12-12 02:26
              
            
            
                                                                       
If I understand the question correctly, this can be very easily (and fast!) done with table and prop.table:

prop.table(table(d$treatment, d$region))


This gives you the percentages of each cell. If you want to get row- or column-wise percentages, you want to make use of the margin parameter in prop.table:

prop.table(table(d$treatment, d$region), margin = 2) # column-wise
prop.table(table(d$treatment, d$region), margin = 1) # row-wise

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复