Optimizing for accuracy instead of loss in Keras model

前端 未结 3 1855
攒了一身酷
攒了一身酷 2021-01-07 02:02

If I correctly understood the significance of the loss function to the model, it directs the model to be trained based on minimizing the loss value. So for example, if I wan

3条回答
  •  孤独总比滥情好
    2021-01-07 02:33

    To start with, the code snippet you have used as example:

    model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
    

    is actually invalid (although Keras will not produce any error or warning) for a very simple and elementary reason: MSE is a valid loss for regression problems, for which problems accuracy is meaningless (it is meaningful only for classification problems, where MSE is not a valid loss function). For details (including a code example), see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?; for a similar situation in scikit-learn, see own answer in this thread.

    Continuing to your general question: in regression settings, usually we don't need a separate performance metric, and we normally use just the loss function itself for this purpose, i.e. the correct code for the example you have used would simply be

    model.compile(loss='mean_squared_error', optimizer='sgd')
    

    without any metrics specified. We could of course use metrics='mse', but this is redundant and not really needed. Sometimes people use something like

    model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['mse','mae'])
    

    i.e. optimise the model according to the MSE loss, but show also its performance in the mean absolute error (MAE) in addition to MSE.

    Now, your question:

    shouldn't the focus of the model during its training to maximize acc (or minimize 1/acc) instead of minimizing MSE?

    is indeed valid, at least in principle (save for the reference to MSE), but only for classification problems, where, roughly speaking, the situation is as follows: we cannot use the vast arsenal of convex optimization methods in order to directly maximize the accuracy, because accuracy is not a differentiable function; so, we need a proxy differentiable function to use as loss. The most common example of such a loss function suitable for classification problems is the cross entropy.

    Rather unsurprisingly, this question of yours pops up from time to time, albeit in slight variations in context; see for example own answers in

    • Cost function training target versus accuracy desired goal
    • Targeting a specific metric to optimize in tensorflow

    For the interplay between loss and accuracy in the special case of binary classification, you may find my answers in the following threads useful:

    • Loss & accuracy - Are these reasonable learning curves?
    • How does Keras evaluate the accuracy?

提交回复
热议问题