Ordering of batch normalization and dropout?

前端未结

关注

 9  2324

小蘑菇 2020-12-12 08:10

The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the

9条回答

孤街浪徒 (楼主)

2020-12-12 09:11

I found a paper that explains the disharmony between Dropout and Batch Norm(BN). The key idea is what they call the "variance shift". This is due to the fact that dropout has a different behavior between training and testing phases, which shifts the input statistics that BN learns. The main idea can be found in this figure which is taken from this paper.

A small demo for this effect can be found in this notebook.

0 讨论(0)

查看其它9个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复