Anova test for GLM in python

后端未结

关注

 2  2113

离开以前 2021-01-13 20:30

I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM.

formula


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   萌比男神i
                                             
                
                
                (楼主)
            
              
              
                2021-01-13 21:21
              

            
            
                        
Here is my attempt to roll your own. 

The F-statistic for nested models is defined as:

(D_s  - D_b ) / (addtl_parameters * phi_b)

Where:


D_s is deviance of small model
D_b is deviance of larger ("big)" model
addtl_parameters is the difference in degrees of freedom between models.
phi_b is the estimate of dispersion parameter for the larger model'


"Statistical theory says that the F-statistic
follows an F distribution, with a numerator degrees of freedom equal to the number of
added parameters and a denominator degrees of freedom equal to n - p_b, or the number
of records minus the number of parameters in the big model."

We translate this into code with:

from scipy import stats

def calculate_nested_f_statistic(small_model, big_model):
    """Given two fitted GLMs, the larger of which contains the parameter space of the smaller, return the F Stat and P value corresponding to the larger model adding explanatory power"""
    addtl_params = big_model.df_model - small_model.df_model
    f_stat = (small_model.deviance - big_model.deviance) / (addtl_params * big_model.scale)
    df_numerator = addtl_params
    # use fitted values to obtain n_obs from model object:
    df_denom = (big_model.fittedvalues.shape[0] - big_model.df_model)
    p_value = stats.f.sf(f_stat, df_numerator, df_denom)
    return (f_stat, p_value)


Here is a reproducible example, following the gamma GLM example in statsmodels (https://www.statsmodels.org/stable/glm.html):

import numpy as np
import statsmodels.api as sm
data2 = sm.datasets.scotland.load()
data2.exog = sm.add_constant(data2.exog, prepend=False)

big_model = sm.GLM(data2.endog, data2.exog, family=sm.families.Gamma()).fit()
# Drop one covariate (column):
smaller_model = sm.GLM(data2.endog, data2.exog[:, 1:], family=sm.families.Gamma()).fit()

# Using function defined in answer:
calculate_nested_f_statistic(smaller_model, big_model)
# (9.519052917304652, 0.004914748992474178)


Source:
https://www.casact.org/pubs/monographs/papers/05-Goldburd-Khare-Tevet.pdf
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复