Should we do learning rate decay for adam optimizer

前端 未结 4 1755
深忆病人
深忆病人 2021-01-29 19:10

I\'m training a network for image localization with Adam optimizer, and someone suggest me to use exponential decay. I don\'t want to try that because Adam optimizer itself deca

4条回答
  •  野性不改
    2021-01-29 19:47

    Adam has a single learning rate, but it is a max rate that is adaptive, so I don't think many people using learning rate scheduling with it.

    Due to the adaptive nature the default rate is fairly robust, but there may be times when you want to optimize it. What you can do is find an optimal default rate beforehand by starting with a very small rate and increasing it until loss stops decreasing, then look at the slope of the loss curve and pick the learning rate that is associated with the fastest decrease in loss (not the point where loss is actually lowest). Jeremy Howard mentions this in the fast.ai deep learning course and its from the Cyclical Learning Rates paper.

    Edit: People have fairly recently started using one-cycle learning rate policies in conjunction with Adam with great results.

提交回复
热议问题