Ordering of batch normalization and dropout?

前端未结

关注

 9  2313

小蘑菇 2020-12-12 08:10

The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the

9条回答

执念已碎 (楼主)

2020-12-12 09:06

I read the recommended papers in the answer and comments from https://stackoverflow.com/a/40295999/8625228

From Ioffe and Szegedy (2015)’s point of view, only use BN in the network structure. Li et al. (2018) give the statistical and experimental analyses, that there is a variance shift when the practitioners use Dropout before BN. Thus, Li et al. (2018) recommend applying Dropout after all BN layers.

From Ioffe and Szegedy (2015)’s point of view, BN is located inside/before the activation function. However, Chen et al. (2019) use an IC layer which combines dropout and BN, and Chen et al. (2019) recommends use BN after ReLU.

On the safety background, I use Dropout or BN only in the network.

Chen, Guangyong, Pengfei Chen, Yujun Shi, Chang-Yu Hsieh, Benben Liao, and Shengyu Zhang. 2019. “Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks.” CoRR abs/1905.05928. http://arxiv.org/abs/1905.05928.

Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” CoRR abs/1502.03167. http://arxiv.org/abs/1502.03167.

Li, Xiang, Shuo Chen, Xiaolin Hu, and Jian Yang. 2018. “Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift.” CoRR abs/1801.05134. http://arxiv.org/abs/1801.05134.

0 讨论(0)

查看其它9个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复