Where is an explicit connection between the optimizer and the loss?
How does the optimizer know where to get the gradients of the loss wit
Perhaps this will clarify a little the connection between loss.backward and optim.step (although the other answers are to the point).
# Our "model"
x = torch.tensor([1., 2.], requires_grad=True)
y = 100*x
# Compute loss
loss = y.sum()
# Compute gradients of the parameters w.r.t. the loss
print(x.grad) # None
loss.backward()
print(x.grad) # tensor([100., 100.])
# MOdify the parameters by subtracting the gradient
optim = torch.optim.SGD([x], lr=0.001)
print(x) # tensor([1., 2.], requires_grad=True)
optim.step()
print(x) # tensor([0.9000, 1.9000], requires_grad=True)
loss.backward() sets the grad attribute of all tensors with requires_grad=True
in the computational graph of which loss is the leaf (only x in this case).
Optimizer just iterates through the list of parameters (tensors) it received on initialization and everywhere where a tensor has requires_grad=True, it subtracts the value of its gradient stored in its .grad property (simply multiplied by the learning rate in case of SGD). It doesn't need to know with respect to what loss the gradients were computed it just wants to access that .grad property so it can do x = x - lr * x.grad
Note that if we were doing this in a train loop we would call optim.zero_grad() because in each train step we want to compute new gradients - we don't care about gradients from the previous batch. Not zeroing grads would lead to gradient accumulation across batches.