Cross validation for glm() models

后端 未结 2 927
南旧
南旧 2021-01-31 11:33

I\'m trying to do a 10-fold cross validation for some glm models that I have built earlier in R. I\'m a little confused about the cv.glm() function in the boo

2条回答
  •  误落风尘
    2021-01-31 12:37

    @Roman provided some answers in his comments, however, the answer to your questions is provided by inspecting the code with cv.glm:

    I believe this bit of code splits the data set up randomly into the K-folds, arranging rounding as necessary if K does not divide n:

    if ((K > n) || (K <= 1)) 
        stop("'K' outside allowable range")
    K.o <- K
    K <- round(K)
    kvals <- unique(round(n/(1L:floor(n/2))))
    temp <- abs(kvals - K)
    if (!any(temp == 0)) 
        K <- kvals[temp == min(temp)][1L]
    if (K != K.o) 
        warning(gettextf("'K' has been set to %f", K), domain = NA)
    f <- ceiling(n/K)
    s <- sample0(rep(1L:K, f), n)
    

    This bit here shows that the delta value is NOT the root mean square error. It is, as the helpfile says The default is the average squared error function. What does this mean? We can see this by inspecting the function declaration:

    function (data, glmfit, cost = function(y, yhat) mean((y - yhat)^2), 
        K = n) 
    

    which shows that within each fold, we calculate the average of the error squared, where error is in the usual sense between predicted response vs actual response.

    delta[1] is simply the weighted average of the SUM of all of these terms for each fold, see my inline comments in the code of cv.glm:

    for (i in seq_len(ms)) {
        j.out <- seq_len(n)[(s == i)]
        j.in <- seq_len(n)[(s != i)]
        Call$data <- data[j.in, , drop = FALSE]
        d.glm <- eval.parent(Call)
        p.alpha <- n.s[i]/n #create weighted average for later
        cost.i <- cost(glm.y[j.out], predict(d.glm, data[j.out, 
            , drop = FALSE], type = "response"))
        CV <- CV + p.alpha * cost.i # add weighted average error to running total
        cost.0 <- cost.0 - p.alpha * cost(glm.y, predict(d.glm, 
            data, type = "response"))
    }
    

提交回复
热议问题