Adam optimizer goes haywire after 200k batches, training loss grows

前端 未结 2 1213
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-30 05:15

I\'ve been seeing a very strange behavior when training a network, where after a couple of 100k iterations (8 to 10 hours) of learning fine, everything breaks and the training l

2条回答
  •  南方客
    南方客 (楼主)
    2021-01-30 06:08

    Yes this could be some sort of super complicated unstable numbers/equations case, but most certainty your training rate is just simply to high as your loss quickly decreases until 25K and then oscillates a lot in the same level. Try to decrease it by factor of 0.1 and see what happens. You should be able to reach even lower loss value.

    Keep exploring! :)

提交回复
热议问题