I\'m training a network for image localization with Adam optimizer, and someone suggest me to use exponential decay. I don\'t want to try that because Adam optimizer itself deca
In my experience it usually not necessary to do learning rate decay with Adam optimizer.
The theory is that Adam already handles learning rate optimization (check reference) :
"We propose Adam, a method for efficient stochastic optimization that only requires first-order gradients with little memory requirement. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients; the name Adam is derived from adaptive moment estimation."
As with any deep learning problem YMMV, one size does not fit all, you should try different approaches and see what works for you, etc. etc.