How to understand the RLstep in Keepaway (Compare with Sarsa)

问题

In "Stone, Peter, Richard S. Sutton, and Gregory Kuhlmann. "Reinforcement learning for robocup soccer keepaway." Adaptive Behavior 13.3 (2005): 165-188.", the RLstep pseudocode seems quite a bit different from Sarsa(λ), which the authors say RLStep implements.

Here is the RLstep pseudocode and here is the Sarsa(lambda) pseudocode.

The areas of confusion are:

Line 10 in the Sarsa(λ) pseudocode updates the Q value for each state-action pair after adding 1 to the e(s,a). But in the RLstep pseudocode the eligibility trace update (line 19) doesn't happen until after the value update (line 17).
Lines 18 and 19 in RLstep seem quite different from the Sarsa(λ) pseudocode.
What are lines 20-25 doing with the eligibility trace?

来源：https://stackoverflow.com/questions/40166586/how-to-understand-the-rlstep-in-keepaway-compare-with-sarsa

标签

reinforcement-learning

sarsa

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!