问题
The attention weights are computed as:
I want to know what the h_s refers to.
In the tensorflow code, the encoder RNN returns a tuple:
encoder_outputs, encoder_state = tf.nn.dynamic_rnn(...)
As I think, the h_s should be the encoder_state, but the github/nmt gives a different answer?
# attention_states: [batch_size, max_time, num_units]
attention_states = tf.transpose(encoder_outputs, [1, 0, 2])
# Create an attention mechanism
attention_mechanism = tf.contrib.seq2seq.LuongAttention(
    num_units, attention_states,
    memory_sequence_length=source_sequence_length)
Did I misunderstand the code? Or the h_s actually means the encoder_outputs?
回答1:
The formula is probably from this post, so I'll use a NN picture from the same post:
Here, the h-bar(s) are all the blue hidden states from the encoder (the last layer), and h(t) is the current red hidden state from the decoder (also the last layer). One the picture t=0, and you can see which blocks are wired to the attention weights with dotted arrows. The score function is usually one of those:
Tensorflow attention mechanism matches this picture. In theory, cell output is in most cases its hidden state (one exception is LSTM cell, in which the output is the short-term part of the state, and even in this case the output suits better for attention mechanism). In practice, tensorflow's encoder_state is different from encoder_outputs when the input is padded with zeros: the state is propagated from the previous cell state while the output is zero. Obviously, you don't want to attend to trailing zeros, so it makes sense to have h-bar(s) for these cells.
So encoder_outputs are exactly the arrows that go from the blue blocks upward. Later in a code, attention_mechanism is connected to each decoder_cell, so that its output goes through the context vector to the yellow block on the picture.
decoder_cell = tf.contrib.seq2seq.AttentionWrapper(
    decoder_cell, attention_mechanism,
    attention_layer_size=num_units)
来源:https://stackoverflow.com/questions/48394009/what-does-the-source-hidden-state-refer-to-in-the-attention-mechanism