Why not to use mean square error for classification problem

问题

I am trying to implement a simple binary classification problem using RNN LSTM and still not available to figure out the correct loss function for the network. The issue is, when I use the cross_binary_entophy as loss function, the loss value for training and testing is relatively high as compared to using a mean_square_error function.

Upon research, I came across to justifications that binary cross entropy should be used for classification problem and MSE for the regression problem. However, in my case, I am getting better accuracies and lesser loss value with MSE for binary classification.

I am not sure how to justify these obtained results. Completely new to AI and ML techniques.

回答1:

Like to share my understanding about MSE and cross_binary_entrophy.

In case of classification, we take the argmax() of probability of each training instance.

Now consider an example of binary classifier where model predicts the probability as (.49, .51). In this case model will return "1" as prediction.

Assume if actual label is also "1".

In such case if MSE is used it will return 0 as a loss value, whereas cross_binary_entrophy will return some tangible value. And if somehow with all data sample, trained model predicts similar type of probability, then cross_binary_entrophy effectively return a big accumulative loss value, whereas MSE will return a 0.

 According to MSE, its a perfect model, but in actuality its not a that good model, that's why we should not use MSE for classification.

回答2:

I would like to show it using an example. Assume a 6 class classification problem.

Assume, True probabilities = [1, 0, 0, 0, 0, 0]

Case 1: Predicted probabilities = [0.2, 0.16, 0.16, 0.16, 0.16, 0.16]

Case 2: Predicted probabilities = [0.4, 0.5, 0.1, 0, 0, 0]

The MSE in the Case1 and Case 2 is 0.128 and 0.1033 respectively.

Although, Case 1 is correctly predicting class 1 for the instance, the loss in Case 1 is higher than the loss in Case 2.

来源：https://stackoverflow.com/questions/56013688/why-not-to-use-mean-square-error-for-classification-problem

标签

python

keras

lstm

cross-entropy

mean-square-error