Why is softmax function necessory? Why not simple normalization?

倖福魔咒の 提交于 2019-12-11 05:15:15

问题


I am not familiar with deep learning so this might be a beginner question. In my understanding, softmax function in Multi Layer Perceptrons is in charge of normalization and distributing probability for each class. If so, why don't we use the simple normalization?

Let's say, we get a vector x = (10 3 2 1) applying softmax, output will be y = (0.9986 0.0009 0.0003 0.0001).

Applying simple normalization (dividing each elements by the sum(16)) output will be y = (0.625 0.1875 0.125 0.166).

It seems like simple normalization could also distribute the probabilities. So, what is the advantage of using softmax function on the output layer?


回答1:


Normalization does not always produce probabilities, for example, it doesn't work when you consider negative values. Or what if the sum of the values is zero?

But using exponential of the logits changes that, it is in theory never zero, and it can map the full range of the logits into probabilities. So it is preferred because it actually works.




回答2:


This depends on the training loss function. Many models are trained with a log loss algorithm, so that the values you see in that vector estimate the log of each probability. Thus, SoftMax is merely converting back to linear values and normalizing.

The empirical reason is simple: SoftMax is used where it produces better results.



来源:https://stackoverflow.com/questions/45965817/why-is-softmax-function-necessory-why-not-simple-normalization

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!