How to guarantee that the actor would select a correct action?

后端未结

关注

 0  779

In the training phase of Deep Deterministic Policy Gradient (DDPG) algorithm, the action selection would be simply

action = actor(state)

wher