How to understand the RLstep in Keepaway (Compare with Sarsa)

半腔热情 提交于 2019-12-24 08:37:07

问题


In "Stone, Peter, Richard S. Sutton, and Gregory Kuhlmann. "Reinforcement learning for robocup soccer keepaway." Adaptive Behavior 13.3 (2005): 165-188.", the RLstep pseudocode seems quite a bit different from Sarsa(λ), which the authors say RLStep implements.

Here is the RLstep pseudocode and here is the Sarsa(lambda) pseudocode.

The areas of confusion are:

  • Line 10 in the Sarsa(λ) pseudocode updates the Q value for each state-action pair after adding 1 to the e(s,a). But in the RLstep pseudocode the eligibility trace update (line 19) doesn't happen until after the value update (line 17).

  • Lines 18 and 19 in RLstep seem quite different from the Sarsa(λ) pseudocode.

  • What are lines 20-25 doing with the eligibility trace?

来源:https://stackoverflow.com/questions/40166586/how-to-understand-the-rlstep-in-keepaway-compare-with-sarsa

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!