What is the way to understand Proximal Policy Optimization Algorithm in RL?
I know the basics of Reinforcement Learning, but what terms it's necessary to understand to be able read arxiv PPO paper ? What is the roadmap to learn and use PPO ? To better understand PPO, it is helpful to look at the main contributions of the paper, which are: (1) the Clipped Surrogate Objective and (2) the use of "multiple epochs of stochastic gradient ascent to perform each policy update". First, to ground these points in the original PPO paper : We have introduced [PPO], a family of policy optimization methods that use multiple epochs of stochastic gradient ascent to perform each policy