Is it normal to use batch normalization in RNN/lstm RNN?

后端未结

关注

 5  2043

终归单人心 2020-12-29 07:00

I am a beginner in deep learning.I know in regular neural nets people use batch norm before activation and it will reduce the reliance on good weight initialization. I wonde

5条回答

我在风中等你 (楼主)

2020-12-29 08:02

The answer is Yes and No.

Why Yes, according to the paper layer normalization, in section it clearly indicates the usage of BN in RNNs.

Why No? The distribution of output at each timestep has to be stored and calcualted to conduct BN. Imagine that you pad the sequence input so all examples have the same length, so if the predict case is longer than all training cases, at some time step you have no mean/std of the output distribution summarized from the SGD training procedure.

Meanwhile, at least in Keras, I believe the BN layer only consider the normalization in vertical direction, i.e., the sequence output. The horizontal direction, i.e., hidden_status, cell_status, are not normalized. Correct me if I an wrong here.

In multiple-layer RNNs, you may consider using layer normalization tricks.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...