All the reinforcement learning algorithms I\'ve read about are usually applied to a single agent that has a fixed number of actions. Are there any reinforcement learning algorit
In continuous domain action spaces, the policy NN often outputs the mean and/or the variance, from which you, then, sample the action, assuming it follows a certain distribution.