Loss suddenly increases with Adam Optimizer in Tensorflow

 ̄綄美尐妖づ 提交于 2019-12-03 11:15:30

My experience over the last months is the following: Adam is very easy to use because you don't have to play with initial learning rate very much and it almost always works. However, when coming to convergence Adam does not really sattle with a solution but jiggles around at higher iterations. While SGD gives an almost perfectly shaped loss plot and seems to converge much better in higher iterations. But changing litte parts of the setup requires to adjust the SGD parameters or you will end up with NaNs... For experiments on architectures and general approaches I favor Adam, but if you want to get the best version of one chosen architecture you should use SGD and at least compare the solutions.

I also noticed that a good initial SGD setup (learning rate, weight decay etc.) converges as fast as using Adam, at leas for my setup. Hope this may help some of you!

EDIT: Please note that the effects in my initial question are NOT normal even with Adam. Seems like I had a bug but I can't really remember the issue there.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!