How to accumulate and appy gradients for Async n-step DQNetwork update in Tensorflow?

蓝咒 提交于 2019-12-05 04:04:13

You don't really have to manually accumulate gradients. You can have Tensorflow accumulate them for you by applying the rollout update as a batch.

s_list = list_of_states_visited
a_list = list_of_actions_taken
R_list = list_of_value_targets

sess.run(local_net.update, feed_dict={
    local_net.input: s_list,
    local_net.a: a_list,
    local_net.R: R_list
})

Something like this might work to create ops for accumulating gradients, resetting the accumulated gradients, and applying the accumulated gradients (untested!):

def build_gradient_accumulators(optimizer, gradients_and_variables):
    accum_grads_and_vars = []
    accumulators = []
    resetters = []

    for grad, var in gradients_and_variables:
        accum = tf.Variable(tf.zeros_like(grad))
        accum = accum.assign_add(grad)
        accumulators.append(accum)
        accum_grads_and_vars.append((accum, var))
        resetters.append(tf.assign(accum, tf.zeros_like(accum)))

    reset_op = tf.group(*resetters)
    accum_op = tf.group(*accumulators)
    apply_op = optimizer.apply_gradients(accum_grads_and_vars)
    return reset_op, accum_op, apply_op
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!