q-learning

How can I apply reinforcement learning to continuous action spaces?

。_饼干妹妹 提交于 2019-11-29 20:28:31
I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning). I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces , I can't seem to figure out how to accommodate a problem with a continuous action space. I could just force all mouse movement to be of a certain magnitude and in only a certain number of different directions, but any reasonable way of making the actions discrete would yield a huge action

What is the difference between Q-learning and SARSA?

拈花ヽ惹草 提交于 2019-11-29 18:57:05
Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and Barto). In the SARSA algorithm, given a policy, the corresponding action-value function Q (in the state s and action a, at timestep t), i.e. Q(s t , a t ), can be updated as follows Q(s t , a t ) = Q(s t , a t ) + α*(r t + γ*Q(s t+1 , a t+1 ) - Q(s t , a t )) On the other hand, the update step for the Q-learning algorithm is the following Q(s t , a

How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

不羁岁月 提交于 2019-11-29 01:41:15
EDIT(1/3/16): corresponding github issue I'm using Tensorflow (Python interface) to implement a q-learning agent with function approximation trained using stochastic gradient-descent. At each iteration of the experiment a step function in the agent is called that updates the parameters of the approximator based on the new reward and activation, and then chooses a new action to perform. Here is the problem(with reinforcement learning jargon): The agent computes its state-action value predictions to choose an action. Then gives control back another program which simulates a step in the

How can I apply reinforcement learning to continuous action spaces?

为君一笑 提交于 2019-11-28 16:00:31
问题 I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning). I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces, I can't seem to figure out how to accommodate a problem with a continuous action space. I could just force all mouse movement to be of a certain magnitude and in only a certain number of

How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

て烟熏妆下的殇ゞ 提交于 2019-11-27 16:05:30
问题 EDIT(1/3/16): corresponding github issue I'm using Tensorflow (Python interface) to implement a q-learning agent with function approximation trained using stochastic gradient-descent. At each iteration of the experiment a step function in the agent is called that updates the parameters of the approximator based on the new reward and activation, and then chooses a new action to perform. Here is the problem(with reinforcement learning jargon): The agent computes its state-action value