torch.optim returns “ValueError: can't optimize a non-leaf Tensor” for multidimensional tensor

问题

I am trying to optimize the translations of the vertices of a scene with torch.optim.adam. It is a code piece from the redner tutorial series, which works fine with the initial setting. It tries to optimize a scene with shifting all the vertices by the same value called translation. Here is the original code:

vertices = []
for obj in base:
    vertices.append(obj.vertices.clone())

def model(translation):
    for obj, v in zip(base, vertices):
        obj.vertices = v + translation
    # Assemble the 3D scene.
    scene = pyredner.Scene(camera = camera, objects = objects)
    # Render the scene.
    img = pyredner.render_albedo(scene)
    return img

# Initial guess
# Set requires_grad=True since we want to optimize them later

translation = torch.tensor([10.0, -10.0, 10.0], device = pyredner.get_device(), requires_grad=True)

init = model(translation)
# Visualize the initial guess

t_optimizer = torch.optim.Adam([translation], lr=0.5)

I tried to modify the code such that it calculates an individual translation for each of the vertices. For this I applied the following modifications to the code above, that makes the shape of the translation from torch.Size([3]) to torch.Size([43380, 3]):

# translation = torch.tensor([10.0, -10.0, 10.0], device = pyredner.get_device(), requires_grad=True)
translation = base[0].vertices.clone().detach().requires_grad_(True)
translation[:] = 10.0

This introduces the ValueError: can't optimize a non-leaf Tensor. Could you please help me work around the problem.

PS: I am sorry for the long text, I am very new to this subject, and I wanted to state the problem as comprehensive as possible.

回答1:

Only leaf tensors can be optimised. A leaf tensor is a tensor that was created at the beginning of a graph, i.e. there is no operation tracked in the graph to produce it. In other words, when you apply any operation to a tensor with requires_grad=True it keeps track of these operations to do the back propagation later. You cannot give one of these intermediate results to the optimiser.

An example shows that more clearly:

weight = torch.randn((2, 2), requires_grad=True)
# => tensor([[ 1.5559,  0.4560],
#            [-1.4852, -0.8837]], requires_grad=True)

weight.is_leaf # => True

result = weight * 2
# => tensor([[ 3.1118,  0.9121],
#            [-2.9705, -1.7675]], grad_fn=<MulBackward0>)
# grad_fn defines how to do the back propagation (kept track of the multiplication)

result.is_leaf # => False

The result in this example, cannot be optimised, since it's not a leaf tensor. Similarly, in your case translation is not a leaf tensor because of the operation you perform after it was created:

translation[:] = 10.0
translation.is_leaf # => False

This has grad_fn=<CopySlices> therefore it's not a leaf and you cannot pass it to the optimiser. To avoid that, you would have to create a new tensor from it that is detached from the graph.

# Not setting requires_grad, so that the next operation is not tracked
translation = base[0].vertices.clone().detach()
translation[:] = 10.0
# Now setting requires_grad so it is tracked in the graph and can be optimised
translation = translation.requires_grad_(True)

What you're really doing here, is create a new tensor filled with the value 10.0 with the same size as the vertices tensor. This can be achieved much easier with torch.full_like

translation = torch.full_like(base[0],vertices, 10.0, requires_grad=True)

回答2:

What is a leaf variable?

A leaf Variable is a variable that is at the beginning of the graph. It means that no operation tracked by the Autograd engine created the variable (that's why it is called leaf variable). During optimizing neural networks, we want to update the leaf variables, such as model weights, inputs, etc.

So to be able to give tensors to the optimizer, they should follow the definition of the leaf variable above.

A few examples.

a = torch.rand(10, requires_grad=True)

Here, a is a leaf variable.

a = torch.rand(10, requires_grad=True).double()

Here, a is NOT a leaf variable as it was created by the operation that cast a float tensor into a double tensor.

a = torch.rand(10).requires_grad_().double()

This is equivalent to the previous formulation: a is not a leaf variable.

a = torch.rand(10).double()

Here, a does not require gradients and has no operation creating it (tracked by the Autograd engine).

a = torch.rand(10).doube().requires_grad_()

Here, a requires grad and has no operation creating it: it's a leaf variable and can be given to an optimizer.

a = torch.rand(10, requires_grad=True, device="cuda")

Here, a requires grad and has no operation creating it: it's a leaf variable and can be given to an optimizer.

I have borrowed the above explanation from this discussion thread.

So, in your case, translation[:] = 10.0 operation makes translation a non-leaf Variable. A potential solution would be:

translation = base[0].vertices.clone().detach()
translation[:] = 10.0
translation = translation.requires_grad_(True)

In the last statement, you set the requires_grad, so, now it will be tracked and optimized.

来源：https://stackoverflow.com/questions/61851506/torch-optim-returns-valueerror-cant-optimize-a-non-leaf-tensor-for-multidime

标签

optimization

pytorch

tensor