value-iteration

What is the difference between value iteration and policy iteration?

烈酒焚心 提交于 2019-12-03 00:07:03
问题 In reinforcement learning, what is the difference between policy iteration and value iteration ? As much as I understand, in value iteration, you use the Bellman equation to solve for the optimal policy, whereas, in policy iteration, you randomly select a policy π, and find the reward of that policy. My doubt is that if you are selecting a random policy π in PI, how is it guaranteed to be the optimal policy, even if we are choosing several random policies. 回答1: Let's look at them side by side

What is the difference between value iteration and policy iteration?

a 夏天 提交于 2019-12-02 13:54:22
In reinforcement learning, what is the difference between policy iteration and value iteration ? As much as I understand, in value iteration, you use the Bellman equation to solve for the optimal policy, whereas, in policy iteration, you randomly select a policy π, and find the reward of that policy. My doubt is that if you are selecting a random policy π in PI, how is it guaranteed to be the optimal policy, even if we are choosing several random policies. zyxue Let's look at them side by side. The key parts for comparison are highlighted. Figures are from Sutton and Barto's book: