I want to implement the following algorithm, taken from this book, section 13.6:
I don\'t understand how to implement the update rule in pytorch (the rule f
I am gonna give this a try.
.backward()
does not need a loss function, it just needs a differentiable scalar output. It approximates a gradient with respect to the model parameters. Let's just look at the first case the update for the value function.
We have one gradient appearing for v, we can approximate this gradient by
v = model(s)
v.backward()
This gives us a gradient of v
which has the dimension of your model parameters. Assuming we already calculated the other parameter updates, we can calculate the actual optimizer update:
for i, p in enumerate(model.parameters()):
z_theta[i][:] = gamma * lamda * z_theta[i] + l * p.grad
p.grad[:] = alpha * delta * z_theta[i]
We can then use opt.step()
to update the model parameters with the adjusted gradient.