Adam optimizer goes haywire after 200k batches, training loss grows

前端未结

关注

 2  1213

爱一瞬间的悲伤 2021-01-30 05:15

I\'ve been seeing a very strange behavior when training a network, where after a couple of 100k iterations (8 to 10 hours) of learning fine, everything breaks and the training l

2条回答

南方客 (楼主)

2021-01-30 06:08

Yes this could be some sort of super complicated unstable numbers/equations case, but most certainty your training rate is just simply to high as your loss quickly decreases until 25K and then oscillates a lot in the same level. Try to decrease it by factor of 0.1 and see what happens. You should be able to reach even lower loss value.

Keep exploring! :)

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...