Ordering of batch normalization and dropout?

前端 未结 9 2324
小蘑菇
小蘑菇 2020-12-12 08:10

The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the

9条回答
  •  孤街浪徒
    2020-12-12 09:11

    I found a paper that explains the disharmony between Dropout and Batch Norm(BN). The key idea is what they call the "variance shift". This is due to the fact that dropout has a different behavior between training and testing phases, which shifts the input statistics that BN learns. The main idea can be found in this figure which is taken from this paper.

    A small demo for this effect can be found in this notebook.

提交回复
热议问题