How to change a learning rate for Adam in TF2?

吃可爱长大的小学妹 提交于 2019-12-12 20:23:02

问题


How to change the learning rate of Adam optimizer, while learning is progressing in TF2? There some answers floating around, but applicable to TF1, e.g. using feed_dict.


回答1:


You can read and assign the learning rate via a callback. So you can use something like this:

class LearningRateReducerCb(tf.keras.callbacks.Callback):

  def on_epoch_end(self, epoch, logs={}):
    old_lr = self.model.optimizer.lr.read_value()
    new_lr = old_lr * 0.99
    print("\nEpoch: {}. Reducing Learning Rate from {} to {}".format(epoch, old_lr, new_lr))
    self.model.optimizer.lr.assign(new_lr)

Which, for example, using the MNIST demo can be applied like this:

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, callbacks=[LearningRateReducerCb()], epochs=5)

model.evaluate(x_test, y_test)

giving output like this:

Train on 60000 samples
Epoch 1/5
59744/60000 [============================>.] - ETA: 0s - loss: 0.2969 - accuracy: 0.9151
Epoch: 0. Reducing Learning Rate from 0.0010000000474974513 to 0.0009900000877678394
60000/60000 [==============================] - 6s 92us/sample - loss: 0.2965 - accuracy: 0.9152
Epoch 2/5
59488/60000 [============================>.] - ETA: 0s - loss: 0.1421 - accuracy: 0.9585
Epoch: 1. Reducing Learning Rate from 0.0009900000877678394 to 0.000980100128799677
60000/60000 [==============================] - 5s 91us/sample - loss: 0.1420 - accuracy: 0.9586
Epoch 3/5
59968/60000 [============================>.] - ETA: 0s - loss: 0.1056 - accuracy: 0.9684
Epoch: 2. Reducing Learning Rate from 0.000980100128799677 to 0.0009702991228550673
60000/60000 [==============================] - 5s 91us/sample - loss: 0.1056 - accuracy: 0.9684
Epoch 4/5
59520/60000 [============================>.] - ETA: 0s - loss: 0.0856 - accuracy: 0.9734
Epoch: 3. Reducing Learning Rate from 0.0009702991228550673 to 0.0009605961386114359
60000/60000 [==============================] - 5s 89us/sample - loss: 0.0857 - accuracy: 0.9733
Epoch 5/5
59712/60000 [============================>.] - ETA: 0s - loss: 0.0734 - accuracy: 0.9772
Epoch: 4. Reducing Learning Rate from 0.0009605961386114359 to 0.0009509901865385473
60000/60000 [==============================] - 5s 87us/sample - loss: 0.0733 - accuracy: 0.9772
10000/10000 [==============================] - 0s 43us/sample - loss: 0.0768 - accuracy: 0.9762
[0.07680597708942369, 0.9762]



回答2:


If you want to use low-level control and not the fit functionality with callbacks, have a look at tf.optimizers.schedules. Here's some example code:

train_steps = 25000
lr_fn = tf.optimizers.schedules.PolynomialDecay(1e-3, train_steps, 1e-5, 2)
opt = tf.optimizers.Adam(lr_fn)

This would decay the learning rate from 1e-3 to 1e-5 over 25000 steps with a power-2 polynomial decay.

Note:

  • This doesn't really "store" a learning rate as in the other answer, but rather the learning rate is now a function that will be called every time it is needed to compute the current learning rate.
  • Optimizer instances have an internal step counter that will count up by one each time apply_gradients is called (as far as I can tell...). This allows for this procedure to work properly when using it in a low-level context (usually with tf.GradientTape)
  • Unfortunately this feature is not well-documented (docs just say that the learning rate argument has to be a float or tensor...) but it works. You can also write your own decay schedules. I think they just need to be functions that take in some current "state" of the optimizer (probably number of training steps) and return a float to be used as learning rate.


来源:https://stackoverflow.com/questions/57301698/how-to-change-a-learning-rate-for-adam-in-tf2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!