Avoiding vanishing gradient in deep neural networks

I'm taking a look at Keras to try to dive into deep learning.

From what I know, stacking just a few dense layers effectively stops back propagation from working due to vanishing gradient problem.

I found out that there is a pre-trained VGG-16 neural network you can download and build on top of it.

This network has 16 layers so I guess, this is the territory where you hit the vanishing gradient problem.

Suppose I wanted to train the network myself in Keras. How should I do it? Should I divide the layers into clusters and train them independently as autoecoders and than stack a classifier on top of it and train it? Is there a built-in mechanism for it in Keras?

No, the vanishing gradient problem is not as prevalent as before, as pretty much all networks (except recurrent ones) use ReLU activations which are considerably less prone to have this problem.

You should just train a network from scratch and see how it works. Do not try to deal with a problem that you don't have yet.

来源：https://stackoverflow.com/questions/46270122/avoiding-vanishing-gradient-in-deep-neural-networks

标签

keras

keras-layer

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!