可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I just started to work on tensorflow not long ago. I'm working on the seq2seq model and somehow got the tutorial to work, but I'm stuck at getting the states of each sentence.
As far as I understand, the seq2seq model takes an input sequence and generates a hidden state for the sequence through RNN. Later, the model uses the sequence's hidden state to generate a new sequence of data.
My problem is what should I do if I want to use the hidden state of input sequence directly? Say for example if I have a trained model, how should I get the final hidden state of input sequence [token1, token2,....,token N] ?
I've been stuck at this for 2 days, I tried many different methods but none of them work.
回答1:
In the seq2seq model the encoder is always an RNN, called through rnn.rnn.
The call to rnn.rnn returns outputs and state, so to get just the state you can do this:
_, encoder_state = rnn.rnn(encoder_cell, encoder_inputs, dtype=dtype)
It's done in the same way in the seq2seq module. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py#L103
回答2:
Alright, I guess my issue is I don't really know how to code in tensorflow style, so I kind of brute forced it.
(*represent where to modify)
at python/ops/seq2seq, modify model_with_buckets()
outputs = [] *states = [] with ops.op_scope(all_inputs, name, "model_with_buckets"): for j in xrange(len(buckets)): if j > 0: vs.get_variable_scope().reuse_variables() bucket_encoder_inputs = [encoder_inputs[i] for i in xrange(buckets[j][0])] bucket_decoder_inputs = [decoder_inputs[i] for i in xrange(buckets[j][1])] *bucket_outputs, _ ,bucket_states= seq2seq(bucket_encoder_inputs, bucket_decoder_inputs) outputs.append(bucket_outputs) states.append(bucket_states) bucket_targets = [targets[i] for i in xrange(buckets[j][1])] bucket_weights = [weights[i] for i in xrange(buckets[j][1])] losses.append(sequence_loss( outputs[-1], bucket_targets, bucket_weights, num_decoder_symbols, softmax_loss_function=softmax_loss_function)) return outputs, losses,*states
at python/ops/seq2seq, modify embedding_attention_seq2seq()
if isinstance(feed_previous, bool): * outputs, states = embedding_attention_decoder( decoder_inputs, encoder_states[-1], attention_states, cell, num_decoder_symbols, num_heads, output_size, output_projection, feed_previous) * return outputs, states, tf.constant(encoder_states[-1]) else: # If feed_previous is a Tensor, we construct 2 graphs and use cond. outputs1, states1 = embedding_attention_decoder( decoder_inputs, encoder_states[-1], attention_states, cell, num_decoder_symbols, num_heads, output_size, output_projection, True) vs.get_variable_scope().reuse_variables() outputs2, states2 = embedding_attention_decoder( decoder_inputs, encoder_states[-1], attention_states, cell, num_decoder_symbols, num_heads, output_size, output_projection, False) outputs = control_flow_ops.cond(feed_previous, lambda: outputs1, lambda: outputs2) states = control_flow_ops.cond(feed_previous, lambda: states1, lambda: states2) *return outputs, states, tf.constant(encoder_states[-1])
at model/rnn/translate/seq2seq_model.py modify init()
if forward_only: * self.outputs, self.losses, self.states = seq2seq.model_with_buckets( self.encoder_inputs, self.decoder_inputs, targets, self.target_weights, buckets, self.target_vocab_size, lambda x, y: seq2seq_f(x, y, True), softmax_loss_function=softmax_loss_function) # If we use output projection, we need to project outputs for decoding. if output_projection is not None: for b in xrange(len(buckets)): self.outputs[b] = [tf.nn.xw_plus_b(output, output_projection[0], output_projection[1]) for output in self.outputs[b]] else: * self.outputs, self.losses,_ = seq2seq.model_with_buckets( self.encoder_inputs, self.decoder_inputs, targets, self.target_weights, buckets, self.target_vocab_size, lambda x, y: seq2seq_f(x, y, False), softmax_loss_function=softmax_loss_function)
at model/rnn/translate/seq2seq_model.py modify step()
if not forward_only: return outputs[1], outputs[2], None # Gradient norm, loss, no outputs. else: *return None, outputs[0], outputs[1:-1], outputs[-1]
with all these done, we can get the encoded states by calling :
_, _, _,states = model.step(all_other_arguements, forward_only = True)
回答3:
bearsteak's answer above is great, but it is based on tensorflow-0.6 which is quite out of date. So I update his answer in tensorflow-0.8 which is also similar to that in the newest version.
(*represent where to modify)
losses = [] outputs = [] *states = [] with ops.op_scope(all_inputs, name, "model_with_buckets"): for j, bucket in enumerate(buckets): with variable_scope.variable_scope(variable_scope.get_variable_scope(), reuse=True if j > 0 else None): *bucket_outputs, _ ,bucket_states= seq2seq(encoder_inputs[:bucket[0]], decoder_inputs[:bucket[1]]) outputs.append(bucket_outputs) if per_example_loss: losses.append(sequence_loss_by_example( outputs[-1], targets[:bucket[1]], weights[:bucket[1]], softmax_loss_function=softmax_loss_function)) else: losses.append(sequence_loss( outputs[-1], targets[:bucket[1]], weights[:bucket[1]], softmax_loss_function=softmax_loss_function)) return outputs, losses, *states
at python/ops/seq2seq, modify embedding_attention_seq2seq()
if isinstance(feed_previous, bool): *outputs, states = embedding_attention_decoder( decoder_inputs, encoder_state, attention_states, cell, num_decoder_symbols, embedding_size, num_heads=num_heads, output_size=output_size, output_projection=output_projection, feed_previous=feed_previous, initial_state_attention=initial_state_attention) *return outputs, states, encoder_state # If feed_previous is a Tensor, we construct 2 graphs and use cond. def decoder(feed_previous_bool): reuse = None if feed_previous_bool else True with variable_scope.variable_scope(variable_scope.get_variable_scope(),reuse=reuse): outputs, state = embedding_attention_decoder( decoder_inputs, encoder_state, attention_states, cell, num_decoder_symbols, embedding_size, num_heads=num_heads, output_size=output_size, output_projection=output_projection, feed_previous=feed_previous_bool, update_embedding_for_previous=False, initial_state_attention=initial_state_attention) return outputs + [state] outputs_and_state = control_flow_ops.cond(feed_previous, lambda: decoder(True), lambda: decoder(False)) *return outputs_and_state[:-1], outputs_and_state[-1], encoder_state
at model/rnn/translate/seq2seq_model.py modify init()
if forward_only: *self.outputs, self.losses, self.states= tf.nn.seq2seq.model_with_buckets( self.encoder_inputs, self.decoder_inputs, targets, self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True), softmax_loss_function=softmax_loss_function) # If we use output projection, we need to project outputs for decoding. if output_projection is not None: for b in xrange(len(buckets)): self.outputs[b] = [ tf.matmul(output, output_projection[0]) + output_projection[1] for output in self.outputs[b] ] else: *self.outputs, self.losses, _ = tf.nn.seq2seq.model_with_buckets( self.encoder_inputs, self.decoder_inputs, targets, self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, False), softmax_loss_function=softmax_loss_function)
at model/rnn/translate/seq2seq_model.py modify step()
if not forward_only: return outputs[1], outputs[2], None # Gradient norm, loss, no outputs. else: *return None, outputs[0], outputs[1:], outputs[-1] # No gradient norm, loss, outputs.
with all these done, we can get the encoded states by calling :
_, _, output_logits, states = model.step(sess, encoder_inputs, decoder_inputs, target_weights, bucket_id, True) print (states)
in the translate.py.