sarsa

Are Q-learning and SARSA with greedy selection equivalent?

痴心易碎 提交于 2020-05-25 07:26:30
问题 The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the actual next state. If a greedy selection policy is used, that is, the action with the highest action value is selected 100% of the time, are SARSA and Q-learning then identical? 回答1: Well, not actually. A key difference between SARSA and Q-learning is that SARSA is an on-policy algorithm (it follows the policy that is

Are Q-learning and SARSA with greedy selection equivalent?

不问归期 提交于 2020-05-25 07:25:06
问题 The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the actual next state. If a greedy selection policy is used, that is, the action with the highest action value is selected 100% of the time, are SARSA and Q-learning then identical? 回答1: Well, not actually. A key difference between SARSA and Q-learning is that SARSA is an on-policy algorithm (it follows the policy that is

What is the difference between Q-learning and SARSA?

只谈情不闲聊 提交于 2019-12-29 02:26:23
问题 Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and Barto). In the SARSA algorithm, given a policy, the corresponding action-value function Q (in the state s and action a, at timestep t), i.e. Q(s t , a t ), can be updated as follows Q(s t , a t ) = Q(s t , a t ) + α*(r t + γ*Q(s t+1 , a t+1 ) - Q

How to understand the RLstep in Keepaway (Compare with Sarsa)

半腔热情 提交于 2019-12-24 08:37:07
问题 In "Stone, Peter, Richard S. Sutton, and Gregory Kuhlmann. "Reinforcement learning for robocup soccer keepaway." Adaptive Behavior 13.3 (2005): 165-188.", the RLstep pseudocode seems quite a bit different from Sarsa(λ), which the authors say RLStep implements. Here is the RLstep pseudocode and here is the Sarsa(lambda) pseudocode. The areas of confusion are: Line 10 in the Sarsa(λ) pseudocode updates the Q value for each state-action pair after adding 1 to the e(s,a) . But in the RLstep

What is the difference between Q-learning and SARSA?

拈花ヽ惹草 提交于 2019-11-29 18:57:05
Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and Barto). In the SARSA algorithm, given a policy, the corresponding action-value function Q (in the state s and action a, at timestep t), i.e. Q(s t , a t ), can be updated as follows Q(s t , a t ) = Q(s t , a t ) + α*(r t + γ*Q(s t+1 , a t+1 ) - Q(s t , a t )) On the other hand, the update step for the Q-learning algorithm is the following Q(s t , a