问题
Let us take a look at the simple class:
class Temp1(nn.Module):
def __init__(self, stateSize, actionSize, layers=[10, 5], activations=[F.tanh, F.tanh] ):
super(Temp1, self).__init__()
self.layer1 = nn.Linear(stateSize, layers[0])
self.layer2 = nn.Linear(layers[0], layers[1])
self.fcFinal = nn.Linear( layers[1], actionSize )
return
This is a fairly straight forward pytorch module. It creates a simple sequential dense network. If we check its hidden parameters, we see the following:
t1 = Temp1(2, 2)
list(t1.parameters())
This is the expected result ...
[Parameter containing:
tensor([[-0.0311, -0.5513],
[-0.0634, -0.3783],
[-0.2514, 0.6139],
[ 0.4711, -0.0241],
[-0.1739, 0.2208],
[-0.1533, 0.3838],
[-0.6490, -0.5784],
[ 0.5312, 0.6703],
[ 0.3506, 0.3652],
[ 0.1768, -0.4158]], requires_grad=True), Parameter containing:
tensor([-0.3199, -0.4154, -0.5530, -0.6738, -0.4411, 0.2641, -0.3576, 0.0447,
0.0254, 0.0965], requires_grad=True), Parameter containing:
tensor([[-2.8257e-01, 6.7583e-02, 9.0356e-02, 1.0868e-01, 4.0876e-02,
4.0616e-02, 4.4419e-02, -8.1544e-02, 2.5244e-01, 3.8777e-03],
[-8.0950e-03, -1.4175e-01, -2.9492e-01, 3.1439e-01, -2.3065e-01,
-6.6631e-02, 3.0047e-01, 2.8353e-01, 2.3457e-01, -3.1399e-03],
[-5.2522e-02, -2.2183e-01, -1.5485e-01, 2.6317e-01, 2.8273e-01,
-7.4823e-02, -5.3704e-02, 9.3526e-02, -1.7916e-01, -3.1132e-04],
[ 8.9063e-02, 2.9263e-01, -1.0052e-01, 8.7005e-02, -1.1246e-01,
-2.7968e-01, 4.1411e-02, -1.6776e-01, 1.2363e-01, -2.2808e-01],
[ 2.9244e-02, 5.8296e-02, -2.9729e-01, -3.1437e-01, -9.3182e-02,
-7.5236e-03, 5.6159e-02, -2.2075e-02, 1.0337e-01, 8.1123e-02]],
requires_grad=True), Parameter containing:
tensor([ 0.2240, 0.0997, -0.0047, -0.1784, -0.0369], requires_grad=True), Parameter containing:
tensor([[ 0.3546, -0.2180, 0.1723, -0.0463, 0.2572],
[-0.1669, -0.1364, -0.0398, 0.2233, -0.1805]], requires_grad=True), Parameter containing:
tensor([ 0.0871, -0.1698], requires_grad=True)]
Now, let us try to generalize this a bit:
class Temp(nn.Module):
def __init__(self, stateSize, actionSize, layers=[10, 5], activations=[F.tanh, F.tanh] ):
super(Temp, self).__init__()
# Generate the fullly connected layer functions
self.fcLayers = []
oldN = stateSize
for i, layer in enumerate(layers):
self.fcLayers.append( nn.Linear(oldN, layer) )
oldN = layer
self.fcFinal = nn.Linear( oldN, actionSize )
return
It turns out that the number of parameters within this module is no longer the same ...
t = Temp(2, 3)
list(t.parameters())
[Parameter containing:
tensor([[-0.3342, 0.4111, 0.0418, 0.4457, 0.0648],
[ 0.4364, -0.0360, -0.2239, 0.4025, 0.1661],
[ 0.1932, -0.0896, 0.3269, -0.2179, 0.1035]], requires_grad=True),
Parameter containing:
tensor([-0.2867, -0.1354, -0.0026], requires_grad=True)]
I believe understand why this is happening. The bigger question is, how do we overcome this problem? The second, generalized method for example will not be sent to the GPU properly, and will not be trained by an optimizer.
回答1:
The problem is that most of the nn.Linear
layers in the "generalized" version are stored in a regular pythonic list (self.fcLayers
). pytorch does not know to look for nn.Paramters
inside regular pythonic members of nn.Module
.
Solution:
If you wish to store nn.Modules
in a way that pytorch can manage them, you need to use specialized pytorch containers.
For instance, if you use nn.ModuleList instead of a regular pythonic list:
self.fcLayers = nn.ModuleList([])
your example should work fine.
BTW,
you need pytorch to know that members of your nn.Module
are modules themselves not only to get their parameters, but also for other functions, such as moving them to gpu/cpu, setting their mode to eval/training etc.
来源:https://stackoverflow.com/questions/56370283/pytorch-nn-module-generalization