How to guarantee that the actor would select a correct action?

后端 未结 0 773
北荒
北荒 2021-01-04 08:48

In the training phase of Deep Deterministic Policy Gradient (DDPG) algorithm, the action selection would be simply

action = actor(state)

wher

相关标签:
回答
  • 消灭零回复
提交回复
热议问题