Is it normal to use batch normalization in RNN/lstm RNN?

后端未结

关注

 5  2038

终归单人心 2020-12-29 07:00

I am a beginner in deep learning.I know in regular neural nets people use batch norm before activation and it will reduce the reliance on good weight initialization. I wonde

5条回答

借酒劲吻你 (楼主)

2020-12-29 07:49

Batch normalization applied to RNNs is similar to batch normalization applied to CNNs: you compute the statistics in such a way that the recurrent/convolutional properties of the layer still hold after BN is applied.

For CNNs, this means computing the relevant statistics not just over the mini-batch, but also over the two spatial dimensions; in other words, the normalization is applied over the channels dimension.

For RNNs, this means computing the relevant statistics over the mini-batch and the time/step dimension, so the normalization is applied only over the vector depths. This also means that you only batch normalize the transformed input (so in the vertical directions, e.g. BN(W_x * x)) since the horizontal (across time) connections are time-dependent and shouldn't just be plainly averaged.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...