Default Initialization for Tensorflow LSTM states and weights?

痴心易碎 提交于 2020-05-26 09:54:27

问题


I am using the LSTM cell in Tensorflow.

lstm_cell = tf.contrib.rnn.BasicLSTMCell(lstm_units)

I was wondering how the weights and states are initialized or rather what the default initializer is for LSTM cells (states and weights) in Tensorflow?

And is there an easy way to manually set an Initializer?

Note: For tf.get_variable() the glorot_uniform_initializer is used as far as I could find out from the documentation.


回答1:


First of all, there is a difference between the weights of a LSTM (the usual parameter set of a ANN), which are by default also initialized by the Glorot or also known as the Xavier initializer (as mentioned in the question).

A different aspect is the cell state and the state of the initial recurrent input to the LSTM. Those are initialized by a matrix usually denoted as initial_state.

Leaving us with the question, how to initialize this initial_state:

  1. Zero State Initialization is good practice if the impact of initialization is low

The default approach to initializing the state of an RNN is to use a zero state. This often works well, particularly for sequence-to-sequence tasks like language modeling where the proportion of outputs that are significantly impacted by the initial state is small.

  1. Zero State Initialization in each batch can lead to overfitting

Zero Initialization for each batch will lead to the following: Losses at the early steps of a sequence-to-sequence model (i.e., those immediately after a state reset) will be larger than those at later steps, because there is less history. Thus, their contribution to the gradient during learning will be relatively higher. But if all state resets are associated with a zero-state, the model can (and will) learn how to compensate for precisely this. As the ratio of state resets to total observations increases, the model parameters will become increasingly tuned to this zero state, which may affect performance on later time steps.

  1. Do we have other options?

One simple solution is to make the initial state noisy (to decrease the loss for the first time step). Look here for details and other ideas




回答2:


I don't think you can initialize an individual cell, but when you execute the LSTM with tf.nn.static_rnn or tf.nn.dynamic_rnn, you can set the initial_state argument to a tensor containing the LSTM's initial values.



来源:https://stackoverflow.com/questions/49223976/default-initialization-for-tensorflow-lstm-states-and-weights

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!