发表新帖

发表新帖

Why use softmax as opposed to standard normalization?

后端未结

关注

 9  2197

一整个雨季 2020-12-02 03:43

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:

9条回答

一生所求 (楼主)

2020-12-02 04:30

Suppose we change the softmax function so the output activations are given by

where c is a positive constant. Note that c=1 corresponds to the standard softmax function. But if we use a different value of c we get a different function, which is nonetheless qualitatively rather similar to the softmax. In particular, show that the output activations form a probability distribution, just as for the usual softmax. Suppose we allow c to become large, i.e., c→∞. What is the limiting value for the output activations a^L_j? After solving this problem it should be clear to you why we think of the c=1 function as a "softened" version of the maximum function. This is the origin of the term "softmax". You can follow the details from this source (equation 83).

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

热议问题