发表新帖

发表新帖

What is the difference between value iteration and policy iteration?

前端未结

关注

 5  1020

陌清茗 2021-01-29 17:44

In reinforcement learning, what is the difference between policy iteration and value iteration?

As much as I understand, in value iteration, you use t

5条回答

自闭症患者 (楼主)

2021-01-29 18:12

In policy iteration algorithms, you start with a random policy, then find the value function of that policy (policy evaluation step), then find a new (improved) policy based on the previous value function, and so on. In this process, each policy is guaranteed to be a strict improvement over the previous one (unless it is already optimal). Given a policy, its value function can be obtained using the Bellman operator.

In value iteration, you start with a random value function and then find a new (improved) value function in an iterative process, until reaching the optimal value function. Notice that you can derive easily the optimal policy from the optimal value function. This process is based on the optimality Bellman operator.

In some sense, both algorithms share the same working principle, and they can be seen as two cases of the generalized policy iteration. However, the optimality Bellman operator contains a max operator, which is non linear and, therefore, it has different features. In addition, it's possible to use hybrid methods between pure value iteration and pure policy iteration.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题