I'm using TensorFlow v:1.1, and I would like to implement a sequence to sequence model using tf.contrib.seq2seq api.
However I have hard time understanding how to use all the functions (BasicDecoder, Dynamic_decode, Helper, Training Helper ...) provided to build my model.
Here is my setup: I would like to "translate" a sequence of feature vector: (batch_size, encoder_max_seq_len, feature_dim) into a sequence of a different length (batch_size, decoder_max_len, 1).
I already have the encoder that is an RNN with LSTM cell, and I get its final state that I would like to feed to the decoder as initial input.
I already have the cell for my decoder, MultiRNNCell LSM.
Could you help me building the last part using the functions of tf.contrib.seq2seq2 and dynamic_decode (an example code or explanations would be much appreciated) ?
Here is my code:
import tensorflow as tf
from tensorflow.contrib import seq2seq
from tensorflow.contrib import rnn
import math
from data import gen_sum_2b2
class Seq2SeqModel:
def __init__(self,
in_size,
out_size,
embed_size,
n_symbols,
cell_type,
n_units,
n_layers):
self.in_size = in_size
self.out_size = out_size
self.embed_size = embed_size
self.n_symbols = n_symbols
self.cell_type = cell_type
self.n_units = n_units
self.n_layers = n_layers
self.build_graph()
def build_graph(self):
self.init_placeholders()
self.init_cells()
self.encoder()
self.decoder_train()
self.loss()
self.training()
def init_placeholders(self):
with tf.name_scope('Placeholders'):
self.encoder_inputs = tf.placeholder(shape=(None, None, self.in_size),
dtype=tf.float32, name='encoder_inputs')
self.decoder_targets = tf.placeholder(shape=(None, None),
dtype=tf.int32, name='decoder_targets')
self.seqs_len = tf.placeholder(dtype=tf.int32)
self.batch_size = tf.placeholder(tf.int32, name='dynamic_batch_size')
self.max_len = tf.placeholder(tf.int32, name='dynamic_seq_len')
decoder_inputs = tf.reshape(self.decoder_targets, shape=(self.batch_size,
self.max_len, self.out_size))
self.decoder_inputs = tf.cast(decoder_inputs, tf.float32)
self.eos_step = tf.ones([self.batch_size, 1], dtype=tf.float32, name='EOS')
self.pad_step = tf.zeros([self.batch_size, 1], dtype=tf.float32, name='PAD')
def RNNCell(self):
c = self.cell_type(self.n_units, reuse=None)
c = rnn.MultiRNNCell([self.cell_type(self.n_units) for i in range(self.n_layers)])
return c
def init_cells(self):
with tf.variable_scope('RNN_enc_cell'):
self.encoder_cell = self.RNNCell()
with tf.variable_scope('RNN_dec_cell'):
self.decoder_cell = rnn.OutputProjectionWrapper(self.RNNCell(), self.n_symbols)
def encoder(self):
with tf.variable_scope('Encoder'):
self.init_state = self.encoder_cell.zero_state(self.batch_size, tf.float32)
_, self.encoder_final_state = tf.nn.dynamic_rnn(self.encoder_cell, self.encoder_inputs,
initial_state=self.init_state)
Decoding layer:
The decoding consists of two parts because of their differences during training and inference:
The decoder input at a particular time-step always comes from the output of the previous time-step. But during training, the output is fixed to the actual target (the actual target is fed back as input) and this has shown to improve performance.
Both these are handled using methods from tf.contrib.seq2seq.
The main function for the
decoderis:seq2seq.dynamic decoder()which performs dynamic decoding:tf.contrib.seq2seq.dynamic_decode(decoder,maximum_iterations)This takes a
Decoderinstance andmaximum_iterations=maximum seq lengthas inputs.1.1 The
Decoderinstance is from:seq2seq.BasicDecoder(cell, helper, initial_state,output_layer)The inputs are:
cell(an RNNCell instance),helper(helper instance),initial_state(initial state of the decoder which should be the output state of the encoder) andoutput_layer(an optional dense layer as outputs to makes predictions)1.2 An RNNCell instance can be a
rnn.MultiRNNCell().1.3 The
helperinstance is the one that differs intrainingandinference. Duringtraining, we want the inputs to be fed to the decoder, while duringinference, we want the output of the decoder intime-step (t)to be passed as the input to the decoder intime step (t+1).For training: we use the helper function:
seq2seq.TrainingHelper(inputs, sequence_length), which just read inputs.For inference: we call the helper function:
seq2seq.GreedyEmbeddingHelper() or seqseq.SampleEmbeddingHelper(), which differs whether it to useargmax() or sampling(from a distribution)of the outputs and passes the result through an embedding layer to get the next input.
Putting together: the Seq2Seq model
- Get the encoder state from the
encoder layerand passed it as ainitial_stateto the decoder. - Get the outputs of
decoder trainanddecoder inferenceusingseq2seq.dynamic_decoder(). When your calling both the methods make sure the weights are shared. (Usevariable_scopeto reuse the weights) - Then train the network using the loss function
seq2seq.sequence_loss.
来源:https://stackoverflow.com/questions/43622778/tensorflow-sequence-to-sequence-model-using-the-seq2seq-api-ver-1-1-and-above