q-learning | 易学教程

How can I apply reinforcement learning to continuous action spaces?

阅读更多关于 How can I apply reinforcement learning to continuous action spaces?

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning). I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces , I can't seem to figure out how to accommodate a problem with a continuous action space. I could just force all mouse movement to be of a certain magnitude and in only a certain number of different directions, but any reasonable way of making the actions discrete would yield a huge action

What is the difference between Q-learning and SARSA?

阅读更多关于 What is the difference between Q-learning and SARSA?

Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and Barto). In the SARSA algorithm, given a policy, the corresponding action-value function Q (in the state s and action a, at timestep t), i.e. Q(s t , a t ), can be updated as follows Q(s t , a t ) = Q(s t , a t ) + α*(r t + γ*Q(s t+1 , a t+1 ) - Q(s t , a t )) On the other hand, the update step for the Q-learning algorithm is the following Q(s t , a

How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

阅读更多关于 How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

EDIT(1/3/16): corresponding github issue I'm using Tensorflow (Python interface) to implement a q-learning agent with function approximation trained using stochastic gradient-descent. At each iteration of the experiment a step function in the agent is called that updates the parameters of the approximator based on the new reward and activation, and then chooses a new action to perform. Here is the problem(with reinforcement learning jargon): The agent computes its state-action value predictions to choose an action. Then gives control back another program which simulates a step in the

How can I apply reinforcement learning to continuous action spaces?

阅读更多关于 How can I apply reinforcement learning to continuous action spaces?

问题 I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning). I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces, I can't seem to figure out how to accommodate a problem with a continuous action space. I could just force all mouse movement to be of a certain magnitude and in only a certain number of

How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

阅读更多关于 How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?

问题 EDIT(1/3/16): corresponding github issue I'm using Tensorflow (Python interface) to implement a q-learning agent with function approximation trained using stochastic gradient-descent. At each iteration of the experiment a step function in the agent is called that updates the parameters of the approximator based on the new reward and activation, and then chooses a new action to perform. Here is the problem(with reinforcement learning jargon): The agent computes its state-action value