Ordering of batch normalization and dropout?

前端未结

关注

 9  2311

The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the

相关标签:

9条回答

孤街浪徒

2020-12-12 09:11

I found a paper that explains the disharmony between Dropout and Batch Norm(BN). The key idea is what they call the "variance shift". This is due to the fact that dropout has a different behavior between training and testing phases, which shifts the input statistics that BN learns. The main idea can be found in this figure which is taken from this paper.

A small demo for this effect can be found in this notebook.

0 讨论(0)

发布评论:

提交评论

加载中...

野性不改

2020-12-12 09:11

Based on the research paper for better performance we should use BN before applying Dropouts

0 讨论(0)

发布评论:

提交评论

加载中...

我在风中等你

2020-12-12 09:15

As noted in the comments, an amazing resource to read up on the order of layers is here. I have gone through the comments and it is the best resource on topic i have found on internet

My 2 cents:

Dropout is meant to block information from certain neurons completely to make sure the neurons do not co-adapt. So, the batch normalization has to be after dropout otherwise you are passing information through normalization statistics.

If you think about it, in typical ML problems, this is the reason we don't compute mean and standard deviation over entire data and then split it into train, test and validation sets. We split and then compute the statistics over the train set and use them to normalize and center the validation and test datasets

so i suggest Scheme 1 (This takes pseudomarvin's comment on accepted answer into consideration)

-> CONV/FC -> ReLu(or other activation) -> Dropout -> BatchNorm -> CONV/FC

as opposed to Scheme 2

-> CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC -> in the accepted answer

Please note that this means that the network under Scheme 2 should show over-fitting as compared to network under Scheme 1 but OP ran some tests as mentioned in question and they support Scheme 2

0 讨论(0)

发布评论:

提交评论

加载中...

上一页 1 2

验证码

看不清?

提交回复