cost function in cv.glm of boot library in R

后端 未结 4 1108
忘掉有多难
忘掉有多难 2021-02-06 10:27

I am trying to use the crossvalidation cv.glm function from the boot library in R to determine the number of misclassifications when a glm logistic regression is applied.

4条回答
  •  迷失自我
    2021-02-06 10:46

    It sounds like you might do well to just use the cost function (i.e. the one named cost) defined further down in the "Examples" section of ?cv.glm. Quoting from that section:

     # [...] Since the response is a binary variable an
     # appropriate cost function is
     cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
    

    This does essentially what you were trying to do with your example. Replacing your "no" and "yes" with 0 and 1, lets say you have two vectors, predict and response. Then cost() is nicely designed to take them and return the mean classification rate:

    ## Simulate some reasonable data
    set.seed(1)
    predict <- seq(0.1, 0.9, by=0.1)
    response <-  rbinom(n=length(predict), prob=predict, size=1)
    response
    # [1] 0 0 0 1 0 0 0 1 1
    
    ## Demonstrate the function 'cost()' in action
    cost(response, predict)
    # [1] 0.3333333  ## Which is right, as 3/9 elements (4, 6, & 7) are misclassified
                     ## (assuming you use 0.5 as the cutoff for your predictions).
    

    I'm guessing the trickiest bit of this will be just getting your mind fully wrapped around the idea of passing a function in as an argument. (At least that was for me, for the longest time, the hardest part of using the boot package, which requires that move in a fair number of places.)


    Added on 2016-03-22:

    The function cost(), given above is in my opinion unnecessarily obfuscated; the following alternative does exactly the same thing but in a more expressive way:

    cost <- function(r, pi = 0) { 
            mean((pi < 0.5) & r==1 | (pi > 0.5) & r==0)
    }
    

提交回复
热议问题