Seq2Seq model learns to only output EOS token (<\\s>) after a few iterations

烈酒焚心 提交于 2019-12-04 08:17:35

recently I also work on seq2seq model. I have encountered your problem before, in my case, I solve it by changing the loss function.

You said you use mask, so I guess you use tf.contrib.seq2seq.sequence_loss as I did.

I changed to tf.nn.softmax_cross_entropy_with_logits, and it works normally (and higher computation cost).

(Edit 05/10/2018. Pardon me, I need to edit since I found there is an egregious mistake in my code)

tf.contrib.seq2seq.sequence_loss can work really well, if the shape of logits ,targets , mask are right. As defined in official document : tf.contrib.seq2seq.sequence_loss

loss=tf.contrib.seq2seq.sequence_loss(logits=decoder_logits,
                                      targets=decoder_targets,
                                      weights=masks) 

#logits:  [batch_size, sequence_length, num_decoder_symbols]  
#targets: [batch_size, sequence_length] 
#weights: [batch_size, sequence_length] 

Well, it can still work even if the shape are not meet. But the result could be weird (lots of #EOS #PAD... etc).

Since the decoder_outputs, and the decoder_targets might have the same shape as required ( In my case, my decoder_targets has the shape [sequence_length, batch_size] ). So try to use tf.transpose to help you reshape the tensor.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!