cost function in cv.glm of boot library in R

后端未结

关注

 4  1108

忘掉有多难 2021-02-06 10:27

I am trying to use the crossvalidation cv.glm function from the boot library in R to determine the number of misclassifications when a glm logistic regression is applied.

4条回答

迷失自我 (楼主)

2021-02-06 10:46
It sounds like you might do well to just use the cost function (i.e. the one named cost) defined further down in the "Examples" section of ?cv.glm. Quoting from that section:
```
 # [...] Since the response is a binary variable an
 # appropriate cost function is
 cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
```
This does essentially what you were trying to do with your example. Replacing your "no" and "yes" with 0 and 1, lets say you have two vectors, predict and response. Then cost() is nicely designed to take them and return the mean classification rate:
```
## Simulate some reasonable data
set.seed(1)
predict <- seq(0.1, 0.9, by=0.1)
response <-  rbinom(n=length(predict), prob=predict, size=1)
response
# [1] 0 0 0 1 0 0 0 1 1

## Demonstrate the function 'cost()' in action
cost(response, predict)
# [1] 0.3333333  ## Which is right, as 3/9 elements (4, 6, & 7) are misclassified
                 ## (assuming you use 0.5 as the cutoff for your predictions).
```
I'm guessing the trickiest bit of this will be just getting your mind fully wrapped around the idea of passing a function in as an argument. (At least that was for me, for the longest time, the hardest part of using the boot package, which requires that move in a fair number of places.)

Added on 2016-03-22:

The function cost(), given above is in my opinion unnecessarily obfuscated; the following alternative does exactly the same thing but in a more expressive way:
```
cost <- function(r, pi = 0) { 
        mean((pi < 0.5) & r==1 | (pi > 0.5) & r==0)
}
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...