Ordering of batch normalization and dropout?

前端 未结 9 2304
小蘑菇
小蘑菇 2020-12-12 08:10

The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the

9条回答
  •  天命终不由人
    2020-12-12 08:57

    In the Ioffe and Szegedy 2015, the authors state that "we would like to ensure that for any parameter values, the network always produces activations with the desired distribution". So the Batch Normalization Layer is actually inserted right after a Conv Layer/Fully Connected Layer, but before feeding into ReLu (or any other kinds of) activation. See this video at around time 53 min for more details.

    As far as dropout goes, I believe dropout is applied after activation layer. In the dropout paper figure 3b, the dropout factor/probability matrix r(l) for hidden layer l is applied to it on y(l), where y(l) is the result after applying activation function f.

    So in summary, the order of using batch normalization and dropout is:

    -> CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC ->

提交回复
热议问题