Optimizing for accuracy instead of loss in Keras model

前端未结

关注

 3  1855

攒了一身酷 2021-01-07 02:02

If I correctly understood the significance of the loss function to the model, it directs the model to be trained based on minimizing the loss value. So for example, if I wan

3条回答

孤独总比滥情好 (楼主)

2021-01-07 02:33
To start with, the code snippet you have used as example:
```
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
```
is actually invalid (although Keras will not produce any error or warning) for a very simple and elementary reason: MSE is a valid loss for regression problems, for which problems accuracy is meaningless (it is meaningful only for classification problems, where MSE is not a valid loss function). For details (including a code example), see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?; for a similar situation in scikit-learn, see own answer in this thread.

Continuing to your general question: in regression settings, usually we don't need a separate performance metric, and we normally use just the loss function itself for this purpose, i.e. the correct code for the example you have used would simply be
```
model.compile(loss='mean_squared_error', optimizer='sgd')
```
without any metrics specified. We could of course use metrics='mse', but this is redundant and not really needed. Sometimes people use something like
```
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['mse','mae'])
```
i.e. optimise the model according to the MSE loss, but show also its performance in the mean absolute error (MAE) in addition to MSE.

Now, your question:

shouldn't the focus of the model during its training to maximize acc (or minimize 1/acc) instead of minimizing MSE?

is indeed valid, at least in principle (save for the reference to MSE), but only for classification problems, where, roughly speaking, the situation is as follows: we cannot use the vast arsenal of convex optimization methods in order to directly maximize the accuracy, because accuracy is not a differentiable function; so, we need a proxy differentiable function to use as loss. The most common example of such a loss function suitable for classification problems is the cross entropy.

Rather unsurprisingly, this question of yours pops up from time to time, albeit in slight variations in context; see for example own answers in
- Cost function training target versus accuracy desired goal
- Targeting a specific metric to optimize in tensorflow
For the interplay between loss and accuracy in the special case of binary classification, you may find my answers in the following threads useful:
- Loss & accuracy - Are these reasonable learning curves?
- How does Keras evaluate the accuracy?
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...