I know that normalizing the observation state returns better results in reinforcement learning Stable-baselines documentation. But I could not find any theoretical backgroun