I\'m building an rnn and using the sequene_length parameter to supply a list of lengths for sequences in a batch, and all of sequences in a batch are padded to the same leng
For all framewise / feed-forward (non-recurrent) operations, masking the loss/cost is enough.
For all sequence / recurrent operations (e.g. dynamic_rnn
), there is always a sequence_length
parameter which you need to set to the corresponding sequence lengths. Then there wont be a gradient for the zero-padded steps, or in other terms, it will have 0 contribution.