发表新帖

发表新帖

Why must a nonlinear activation function be used in a backpropagation neural network? [closed]

后端未结

关注

 13  2291

傲寒 2020-11-29 15:07

13条回答

盖世英雄少女心 (楼主)

2020-11-29 15:37
Neural Networks are used in pattern recognition. And pattern finding is a very non-linear technique.

Suppose for the sake of argument we use a linear activation function y=wX+b for every single neuron and set something like if y>0 -> class 1 else class 0.

Now we can compute our loss using square error loss and back propagate it so that the model learns well, correct?

WRONG.
- For the last hidden layer, the updated value will be w{l} = w{l} - (alpha)*X.
- For the second last hidden layer, the updated value will be w{l-1} = w{l-1} - (alpha)*w{l}*X.
- For the ith last hidden layer, the updated value will be w{i} = w{i} - (alpha)*w{l}...*w{i+1}*X.
This results in us multiplying all the weight matrices together hence resulting in the possibilities: A)w{i} barely changes due to vanishing gradient B)w{i} changes dramatically and inaccurately due to exploding gradient C)w{i} changes well enough to give us a good fit score

In case C happens that means that our classification/prediction problem was most probably a simple linear/logistic regressor based one and never required a neural network in the first place!

No matter how robust or well hyper tuned your NN is, if you use a linear activation function, you will never be able to tackle non-linear requiring pattern recognition problems
0 讨论(0)

查看其它13个回答
发布评论:

提交评论
- 加载中...

热议问题