Which loss-function is better than MSE in temperature prediction?

混江龙づ霸主 提交于 2019-12-06 11:03:30

Let me give a rather theoretical explanation on the choice of loss function. As you may guess, it all depends on the data.

MSE has a nice probabilistic interpretation: it corresponds to MLE (maximum likelihood estimator) under assumption that the distribution p(y|x) is Gaussian: p(y|x) ~ N(mu, sigma). Since MLE converges to the true parameter value, this means is that under this assumption, the found minimum is very likely to be the best fit you can possibly get. Of course, you may find local instead of global minimum, there's also implicit assumption that your training data represent x distribution well. But this kind of uncertainty is inevitable, so realistically we just accept it.

Moving on, L1 loss (absolute difference) minimization is equivalent to MLE maximization under assumption that p(y|x) has Laplace distribution. And here's the same conclusion: if the data fits this distribution, no other loss will work better than L1 loss.

Huber loss doesn't have strict probability interpretation (at least I'm not aware of it), it's somewhat in between L1 and L2, closer to one or another depending on the choice of delta.

How does it help you in finding the right loss function? First of all, this means that no loss is by default superior than other. Secondly, the better you understand the data, the more you can be sure your choice of the loss function is correct. Of course, you can just cross-validate all of these options and select the best one. But here's a good reason to do this kind of analysis: when you are confident in data distribution, you will see steady improvement with adding new training data and increasing model complexity. Otherwise, it's simply possible that the model will never generalize.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!