发表新帖

发表新帖

Why use softmax as opposed to standard normalization?

后端未结

关注

 9  2209

一整个雨季 2020-12-02 03:43

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:

9条回答

悲&欢浪女 (楼主)

2020-12-02 04:09

The choice of the softmax function seems somehow arbitrary as there are many other possible normalizing functions. It is thus unclear why the log-softmax loss would perform better than other loss alternatives.

From "An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family" https://arxiv.org/abs/1511.05042

The authors explored some other functions among which are Taylor expansion of exp and so called spherical softmax and found out that sometimes they might perform better than usual softmax.

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

热议问题