Why use softmax as opposed to standard normalization?

后端 未结 9 2219
一整个雨季
一整个雨季 2020-12-02 03:43

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:

9条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-02 04:20

    The values of q_i represent log-likelihoods. In order to recover the probability values, you need to exponentiate them.

    One reason that statistical algorithms often use log-likelihood loss functions is that they are more numerically stable: a product of probabilities may be represented be a very small floating point number. Using a log-likelihood loss function, a product of probabilities becomes a sum.

    Another reason is that log-likelihoods occur naturally when deriving estimators for random variables that are assumed to be drawn from multivariate Gaussian distributions. See for example the Maximum Likelihood (ML) estimator and the way it is connected to least squares.

    As a sidenote, I think that this question is more appropriate for the CS Theory or Computational Science Stack Exchanges.

提交回复
热议问题