Should we do learning rate decay for adam optimizer

前端未结

关注

 4  1768

深忆病人 2021-01-29 19:10

I\'m training a network for image localization with Adam optimizer, and someone suggest me to use exponential decay. I don\'t want to try that because Adam optimizer itself deca

4条回答

忘掉有多难 (楼主)

2021-01-29 19:25

In my experience it usually not necessary to do learning rate decay with Adam optimizer.

The theory is that Adam already handles learning rate optimization (check reference) :

"We propose Adam, a method for efficient stochastic optimization that only requires first-order gradients with little memory requirement. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients; the name Adam is derived from adaptive moment estimation."

As with any deep learning problem YMMV, one size does not fit all, you should try different approaches and see what works for you, etc. etc.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...