R error which says “Models were not all fitted to the same size of dataset”

后端 未结 6 1044
栀梦
栀梦 2020-12-01 21:42

I have created two generalised linear models as follows:

glm1 <-glm(Y ~ X1 + X2 + X3, family=binomial(link=logit))

glm2 <-glm(Y ~ X1 + X2, family=bino         


        
6条回答
  •  佛祖请我去吃肉
    2020-12-01 22:08

    To avoid the "models were not all fitted to the same size of dataset" error, you must fit both models on the exact same subset of data. There are two simple ways to do this:

    • either use data=glm1$model in the 2nd model fit
    • or retrieve the correctly subsetted dataset by using data=na.omit(orig.data[ , all.vars(formula(glm1))]) in the 2nd model fit

    Here's a reproducible example using lm (for glm the same approach should work) and update:

    # 1st approach
    # define a convenience wrapper
    update_nested <- function(object, formula., ..., evaluate = TRUE){
        update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
    }
    
    # prepare data with NAs
    data(mtcars)
    for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
    
    xa <- lm(mpg~cyl+disp, mtcars)
    xb <- update_nested(xa, .~.-cyl)
    anova(xa, xb)
    ## Analysis of Variance Table
    ## 
    ## Model 1: mpg ~ cyl + disp
    ## Model 2: mpg ~ disp
    ##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
    ## 1     26 256.91                              
    ## 2     27 301.32 -1   -44.411 4.4945 0.04371 *
    ## ---
    ## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    # 2nd approach
    xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
    anova(xa, xc)
    ## Analysis of Variance Table
    ## 
    ## Model 1: mpg ~ cyl + disp
    ## Model 2: mpg ~ disp
    ##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
    ## 1     26 256.91                              
    ## 2     27 301.32 -1   -44.411 4.4945 0.04371 *
    ## ---
    ## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

    See also:

    • How to update `lm` or `glm` model on same subset of data?

提交回复
热议问题