发表新帖

发表新帖

Why use softmax as opposed to standard normalization?

后端未结

关注

 9  2219

一整个雨季 2020-12-02 03:43

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:

9条回答

刺人心 (楼主)

2020-12-02 04:20

The values of q_i represent log-likelihoods. In order to recover the probability values, you need to exponentiate them.

One reason that statistical algorithms often use log-likelihood loss functions is that they are more numerically stable: a product of probabilities may be represented be a very small floating point number. Using a log-likelihood loss function, a product of probabilities becomes a sum.

Another reason is that log-likelihoods occur naturally when deriving estimators for random variables that are assumed to be drawn from multivariate Gaussian distributions. See for example the Maximum Likelihood (ML) estimator and the way it is connected to least squares.

As a sidenote, I think that this question is more appropriate for the CS Theory or Computational Science Stack Exchanges.

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

热议问题