Training with dropout
问题 How are the many thinned layers resulting from dropout averaged? And which weights are to be used during the testing stage? I'm really confused about this one. Because each thinned layers would learn a different set of weights. So backpropagation is done separately for each of the thinned networks? And how exactly are weights shared among these thinned networks? Because at testing time only one neural network is used and one set of weights. So which set of weights are used? It is said that a