Q-Learning values get too high

后端未结

关注

 2  1605

忘掉有多难 2021-01-06 12:30

I\'ve recently made an attempt to implement a basic Q-Learning algorithm in Golang. Note that I\'m new to Reinforcement Learning and AI in general, so the error may very wel

2条回答

长情又很酷 (楼主)

2021-01-06 13:00

The reward function is likely the problem. Reinforcement learning methods try to maximize the expected total reward; it gets a positive reward for every time step in the game, so the optimal policy is to play as long as possible! The q-values, which define the value function (expected total reward of taking an action in a state then behaving optimally) are growing because the correct expectation is unbounded. To incentivize winning, you should have a negative reward every time step (kind of like telling the agent to hurry up and win).

See 3.2 Goals and Rewards in Reinforcement Learning: An Introduction for more insight into the purpose and definition of reward signals. The problem you are facing is actually exercise 3.5 in the book.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...