Anova test for GLM in python

后端 未结 2 2113
离开以前
离开以前 2021-01-13 20:30

I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM.

formula         


        
2条回答
  •  萌比男神i
    2021-01-13 21:21

    Here is my attempt to roll your own.

    The F-statistic for nested models is defined as:

    (D_s - D_b ) / (addtl_parameters * phi_b)

    Where:

    • D_s is deviance of small model
    • D_b is deviance of larger ("big)" model
    • addtl_parameters is the difference in degrees of freedom between models.
    • phi_b is the estimate of dispersion parameter for the larger model'

    "Statistical theory says that the F-statistic follows an F distribution, with a numerator degrees of freedom equal to the number of added parameters and a denominator degrees of freedom equal to n - p_b, or the number of records minus the number of parameters in the big model."

    We translate this into code with:

    from scipy import stats
    
    def calculate_nested_f_statistic(small_model, big_model):
        """Given two fitted GLMs, the larger of which contains the parameter space of the smaller, return the F Stat and P value corresponding to the larger model adding explanatory power"""
        addtl_params = big_model.df_model - small_model.df_model
        f_stat = (small_model.deviance - big_model.deviance) / (addtl_params * big_model.scale)
        df_numerator = addtl_params
        # use fitted values to obtain n_obs from model object:
        df_denom = (big_model.fittedvalues.shape[0] - big_model.df_model)
        p_value = stats.f.sf(f_stat, df_numerator, df_denom)
        return (f_stat, p_value)
    

    Here is a reproducible example, following the gamma GLM example in statsmodels (https://www.statsmodels.org/stable/glm.html):

    import numpy as np
    import statsmodels.api as sm
    data2 = sm.datasets.scotland.load()
    data2.exog = sm.add_constant(data2.exog, prepend=False)
    
    big_model = sm.GLM(data2.endog, data2.exog, family=sm.families.Gamma()).fit()
    # Drop one covariate (column):
    smaller_model = sm.GLM(data2.endog, data2.exog[:, 1:], family=sm.families.Gamma()).fit()
    
    # Using function defined in answer:
    calculate_nested_f_statistic(smaller_model, big_model)
    # (9.519052917304652, 0.004914748992474178)
    

    Source: https://www.casact.org/pubs/monographs/papers/05-Goldburd-Khare-Tevet.pdf

提交回复
热议问题