Should we do learning rate decay for adam optimizer

前端 未结 4 1768
深忆病人
深忆病人 2021-01-29 19:10

I\'m training a network for image localization with Adam optimizer, and someone suggest me to use exponential decay. I don\'t want to try that because Adam optimizer itself deca

4条回答
  •  忘掉有多难
    2021-01-29 19:25

    In my experience it usually not necessary to do learning rate decay with Adam optimizer.

    The theory is that Adam already handles learning rate optimization (check reference) :

    "We propose Adam, a method for efficient stochastic optimization that only requires first-order gradients with little memory requirement. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients; the name Adam is derived from adaptive moment estimation."

    As with any deep learning problem YMMV, one size does not fit all, you should try different approaches and see what works for you, etc. etc.

提交回复
热议问题