Why are probabilities and response in ksvm in R not consistent?

前端未结

关注

 1  830

轮回少年 2020-12-18 23:41

I am using ksvm from the kernlab package in R to predict probabilities, using the type=\"probabilities\" option in predict.ksvm. Howev

1条回答

無奈伤痛 (楼主)

2020-12-19 00:17
If you look at the decicision matrix and votes, they seem to be more in line with the responses:
```
> predict(out, newdata = testdat, type = "response")
[1] 0 0 1 1
Levels: 0 1
> predict(out, newdata = testdat, type = "decision")
            [,1]
[1,] -0.07077917
[2,] -0.01762016
[3,]  0.02210974
[4,]  0.04762563
> predict(out, newdata = testdat, type = "votes")
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    0
[2,]    0    0    1    1
> predict(out, newdata = testdat, type = "prob")
             0         1
[1,] 0.7198132 0.2801868
[2,] 0.6987129 0.3012871
[3,] 0.6823679 0.3176321
[4,] 0.6716249 0.3283751
```
The kernlab help pages (?predict.ksvm) link to paper Probability estimates for Multi-class Classification by Pairwise Coupling by T.F. Wu, C.J. Lin, and R.C. Weng.

In section 7.3 it is said that the decisions and probabilities can differ:

...We explain why the results by probability-based and decision-value-based methods can be so distinct. For some problems, the parameters selected by δDV are quite different from those by the other five rules. In waveform, at some parameters all probability-based methods gives much higher cross validation accuracy than δDV . We observe, for example, the decision values of validation sets are in [0.73, 0.97] and [0.93, 1.02] for data in two classes; hence, all data in the validation sets are classified as in one class and the error is high. On the contrary, the probability-based methods fit the decision values by a sigmoid function, which can better separate the two classes by cutting at a decision value around 0.95. This observation shed some light on the difference between probability-based and decision-value based methods...

I'm not familiar enough with these methods to understand the issue, but maybe you do, It looks like that there is distinct methods for predicting with probabilities and some other method, and the type=response corresponds to different method than the one which is used for prediction probabilities.
0 讨论(0)
发布评论:

提交评论
- 加载中...