sarsa | 易学教程

Are Q-learning and SARSA with greedy selection equivalent?

阅读更多关于 Are Q-learning and SARSA with greedy selection equivalent?

问题 The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the actual next state. If a greedy selection policy is used, that is, the action with the highest action value is selected 100% of the time, are SARSA and Q-learning then identical? 回答1: Well, not actually. A key difference between SARSA and Q-learning is that SARSA is an on-policy algorithm (it follows the policy that is

Are Q-learning and SARSA with greedy selection equivalent?

阅读更多关于 Are Q-learning and SARSA with greedy selection equivalent?

What is the difference between Q-learning and SARSA?

阅读更多关于 What is the difference between Q-learning and SARSA?

问题 Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and Barto). In the SARSA algorithm, given a policy, the corresponding action-value function Q (in the state s and action a, at timestep t), i.e. Q(s t , a t ), can be updated as follows Q(s t , a t ) = Q(s t , a t ) + α*(r t + γ*Q(s t+1 , a t+1 ) - Q

How to understand the RLstep in Keepaway (Compare with Sarsa)

阅读更多关于 How to understand the RLstep in Keepaway (Compare with Sarsa)

问题 In "Stone, Peter, Richard S. Sutton, and Gregory Kuhlmann. "Reinforcement learning for robocup soccer keepaway." Adaptive Behavior 13.3 (2005): 165-188.", the RLstep pseudocode seems quite a bit different from Sarsa(λ), which the authors say RLStep implements. Here is the RLstep pseudocode and here is the Sarsa(lambda) pseudocode. The areas of confusion are: Line 10 in the Sarsa(λ) pseudocode updates the Q value for each state-action pair after adding 1 to the e(s,a) . But in the RLstep

What is the difference between Q-learning and SARSA?

阅读更多关于 What is the difference between Q-learning and SARSA?

Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book Reinforcement Learning: An Introduction (by Sutton and Barto). In the SARSA algorithm, given a policy, the corresponding action-value function Q (in the state s and action a, at timestep t), i.e. Q(s t , a t ), can be updated as follows Q(s t , a t ) = Q(s t , a t ) + α*(r t + γ*Q(s t+1 , a t+1 ) - Q(s t , a t )) On the other hand, the update step for the Q-learning algorithm is the following Q(s t , a