pytorch - connection between loss.backward() and optimizer.step()

前端 未结 5 507
谎友^
谎友^ 2020-12-23 13:04

Where is an explicit connection between the optimizer and the loss?

How does the optimizer know where to get the gradients of the loss wit

5条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-23 13:26

    Perhaps this will clarify a little the connection between loss.backward and optim.step (although the other answers are to the point).

    # Our "model"
    x = torch.tensor([1., 2.], requires_grad=True)
    y = 100*x
    
    # Compute loss
    loss = y.sum()
    
    # Compute gradients of the parameters w.r.t. the loss
    print(x.grad)     # None
    loss.backward()      
    print(x.grad)     # tensor([100., 100.])
    
    # MOdify the parameters by subtracting the gradient
    optim = torch.optim.SGD([x], lr=0.001)
    print(x)        # tensor([1., 2.], requires_grad=True)
    optim.step()
    print(x)        # tensor([0.9000, 1.9000], requires_grad=True)
    

    loss.backward() sets the grad attribute of all tensors with requires_grad=True in the computational graph of which loss is the leaf (only x in this case).

    Optimizer just iterates through the list of parameters (tensors) it received on initialization and everywhere where a tensor has requires_grad=True, it subtracts the value of its gradient stored in its .grad property (simply multiplied by the learning rate in case of SGD). It doesn't need to know with respect to what loss the gradients were computed it just wants to access that .grad property so it can do x = x - lr * x.grad

    Note that if we were doing this in a train loop we would call optim.zero_grad() because in each train step we want to compute new gradients - we don't care about gradients from the previous batch. Not zeroing grads would lead to gradient accumulation across batches.

提交回复
热议问题