Neural Network Reinforcement Learning Requiring Next-State Propagation For Backpropagation

问题

I am attempting to construct a neural network incorporating convolution and LSTM (using the Torch library) to be trained by Q-learning or Advantage-learning, both of which require propagating state T+1 through the network before updating the weights for state T.

Having to do an extra propagation would cut performance and that's bad, but not too bad; However, the problem is that there is all kinds of state bound up in this. First of all, the Torch implementation of backpropagation has some efficiency shortcuts that rely on the back propagation happening immediately after the forward propagation, which an additional propagation would mess up. I could possibly get around this by having a secondary cloned network sharing the weight values, but we come to the second problem.

Every forward propagation involving LSTMs is stateful. How can I update the weights at T+1 when propagating network(T+1) may have changed the contents of the LSTMs? I have tried to look at the discussion of TD weight updates as done in TD-Gammon, but it's obtuse to me and that's for feedforward anyway, not recurrent.

How can I update the weights of a network at T without having to advance the network to T+1, or how do I advance the network to T+1 and then go back and adjust the weights as if it were still T?

来源：https://stackoverflow.com/questions/32082506/neural-network-reinforcement-learning-requiring-next-state-propagation-for-backp

标签

neural-network

reinforcement-learning

torch

lstm

temporal-difference

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!