Loss in Tensorflow suddenly turn into nan

匆匆过客 提交于 2019-12-05 20:27:01

Quite often, those NaN come from a divergence in the optimization due to increasing gradients. They usually don't appear at once, but rather after a phase where the loss increases suddenly and within a few steps reaches inf. The reason you do not see this explosive increase is probably because you check your loss only every epoch -- try to display your loss every step or every few steps and you are likely to see this phenomenon.

As to why your gradient exploses suddenly, I would suggest you try without tf.sqrt in your loss function. This should be more numerically stable. tf.sqrt has the bad property of having an exploding gradient near zero. This means increasing risks of divergence once you get close to a solution -- looks a lot like what you are observing.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!