问题
How are the many thinned layers resulting from dropout averaged? And which weights are to be used during the testing stage? I'm really confused about this one. Because each thinned layers would learn a different set of weights. So backpropagation is done separately for each of the thinned networks? And how exactly are weights shared among these thinned networks? Because at testing time only one neural network is used and one set of weights. So which set of weights are used?
It is said that a different thinned network is trained for each training case. What is exactly meant by training case? You mean each forward and backpropagation trains a different thinned network once? Then the next forward and backpropagation trains another thinned network? How are weights learned?
回答1:
While Training:
In Dropout, you just force some number (dropout probability) of activations/outputs of that layer to be zero. Usually, a boolean mask is created to drop these activations. These masks are used while doing back propagation. So, gradients are applied to weights that are only used in forward prop.
While Testing:
All weights are used. All neurons are kept(no dropout), but the activations/outputs of that layer are scaled by p (dropout probability) for normalizing the whole output from that layer.
Its just one network as shown in above figure (used from here : https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf )
Issues: I don't understand what do you mean by thinned networks.
I hope this helps.
来源:https://stackoverflow.com/questions/44030753/training-with-dropout