Is it normal to use batch normalization in RNN/lstm RNN?

后端 未结 5 2050
终归单人心
终归单人心 2020-12-29 07:00

I am a beginner in deep learning.I know in regular neural nets people use batch norm before activation and it will reduce the reliance on good weight initialization. I wonde

5条回答
  •  北荒
    北荒 (楼主)
    2020-12-29 07:44

    In any non-recurrent network (convnet or not) when you do BN each layer gets to adjust the incoming scale and mean so the incoming distribution for each layer doesn't keep changing (which is what the authors of the BN paper claim is the advantage of BN).

    The problem with doing this for the recurrent outputs of an RNN is that the parameters for the incoming distribution are now shared between all timesteps (which are effectively layers in backpropagation-through-time, or BPTT). So the distribution ends up being fixed across the temporal layers of BPTT. This may not make sense as there may be structure in the data that varies (in a non-random way) across the time series. For example, if the time series is a sentence certain words are much more likely to appear in the beginning or end. So having the distribution fixed might reduce the effectiveness of BN.

提交回复
热议问题