Setting initial state in dynamic RNN

问题

Based on the link:

https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn

In the example, it is shown that the "initial state" is defined in the first example and not in the second example. Could anyone please explain what is the purpose of the initial state? What's the difference if I don't set it vs if i set it? Is it only required in a single RNN cell and not in a stacked cell like in the example provided in the link?

I'm currently debugging my RNN model, as it seemed to classify different questions in the same category, which is strange. I suspect that it might have to do with me not setting the initial state of the cell.

回答1:

Could anyone please explain what is the purpose of initial state?

As we know that the state matrix is the weights between the hidden neurons in timestep 1 and timestep 2. They join the hidden neurons of both the time steps. Hence they hold temporal data from the layers in previous time steps.

Providing an initially trained state matrix by the initial_state= argument gives the RNN cell a trained memory of its previous activations.

What's the difference if I don't set it vs if I set it?

If we set the initial weights which have been trained on some other model or the previous model, it means that we are restoring the memory of the RNN cell so that it does not have to start from scratch.

In the TF docs, they have initialized the initial_state as zero_state matrix.

If you don't set the initial_state, it will be trained from scratch as other weight matrices do.

Is it only required in a single RNN cell and not in a stacked cell like in the example provided in the link?

I exactly don't know that why haven't they set the initial_state in the Stacked RNN example, but initial_state is required in every type of RNN as it holds the preserves the temporal features across time steps.

Maybe, Stacked RNN was the point of interest in the docs and not the settings of initial_state.

Tip:

In most cases, you will not need to set the initial_state for an RNN. TensorFlow can handle this efficiently for us. In the case of seq2seq RNN, this property may be used.

Your RNN maybe facing some other issue. Your RNN build ups its own memory and doesn't require powerup.

来源：https://stackoverflow.com/questions/56140870/setting-initial-state-in-dynamic-rnn

标签

tensorflow

recurrent-neural-network