Today I add a learning rate decay to my LSTM in Tensorflow.
I change
train_op = tf.train.RMSPropOptimizer(lr_rate).minimize(loss)
t
While I'm not sure why this could slow down the process this much (in https://github.com/tensorflow/tensorflow/issues/1439 it seems constantly creating new graph nodes can cause this), I think it is better to use feed_dict to do this:
learn_rate = tf.placeholder(tf.float32, shape=[])
optiizer = tf.train.AdamOptimizer(learn_rate)
...
learnrate = 1e-5
...
sess.run(minimizer, feed_dict={learn_rate: learnrate})
I use this approach and I see no performance issue. Moreover, you can pass an arbitrary number, so you can even increase/decrease learning rate based on error on train/validation data.