Why use softmax only in the output layer and not in hidden layers?

前端 未结 5 866
情歌与酒
情歌与酒 2020-12-29 10:00

Most examples of neural networks for classification tasks I\'ve seen use the a softmax layer as output activation function. Normally, the other hidden units use a sigmoid, t

5条回答
  •  难免孤独
    2020-12-29 10:30

    Use a softmax activation wherever you want to model a multinomial distribution. This may be (usually) an output layer y, but can also be an intermediate layer, say a multinomial latent variable z. As mentioned in this thread for outputs {o_i}, sum({o_i}) = 1 is a linear dependency, which is intentional at this layer. Additional layers may provide desired sparsity and/or feature independence downstream.

    Page 198 of Deep Learning (Goodfellow, Bengio, Courville)

    Any time we wish to represent a probability distribution over a discrete variable with n possible values, we may use the softmax function. This can be seen as a generalization of the sigmoid function which was used to represent a probability distribution over a binary variable. Softmax functions are most often used as the output of a classifier, to represent the probability distribution over n different classes. More rarely, softmax functions can be used inside the model itself, if we wish the model to choose between one of n different options for some internal variable.

提交回复
热议问题