Neural Network Always Produces Same/Similar Outputs for Any Input

前端未结

关注

 11  1954

猫巷女王i 2020-12-13 04:24

I have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same out

11条回答

北荒 (楼主)

2020-12-13 04:42
So I realise this is extremely late for the original post, but I came across this because I was having a similar problem and none of the reasons posted here cover what was wrong in my case.

I was working on a simple regression problem, but every time I trained the network it would converge to a point where it was giving me the same output (or sometimes a few different outputs) for each input. I played with the learning rate, the number of hidden layers/nodes, the optimization algorithm etc but it made no difference. Even when I looked at a ridiculously simple example, trying to predict the output (1d) of two different inputs (1d):
```
    import numpy as np
    import torch
    import torch.nn as nn
    import torch.nn.functional as F

    class net(nn.Module):
        def __init__(self, obs_size, hidden_size):
            super(net, self).__init__()
            self.fc = nn.Linear(obs_size, hidden_size)
            self.out = nn.Linear(hidden_size, 1)

        def forward(self, obs):
            h = F.relu(self.fc(obs))
            return self.out(h)

    inputs = np.array([[0.5],[0.9]])
    targets = torch.tensor([3.0, 2.0], dtype=torch.float32)

    network = net(1,5)
    optimizer = torch.optim.Adam(network.parameters(), lr=0.001)

    for i in range(10000):
        out = network(torch.tensor(inputs, dtype=torch.float32))
        loss = F.mse_loss(out, targets)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print("Loss: %f outputs: %f, %f"%(loss.data.numpy(), out.data.numpy()[0], out.data.numpy()[1]))
```
but STILL it was always outputting the average value of the outputs for both inputs. It turns out the reason is that the dimensions of my outputs and targets were not the same: the targets were Size[2], and the outputs were Size[2,1], and for some reason PyTorch was broadcasting the outputs to be Size[2,2] in the MSE loss, which completely messes everything up. Once I changed:
```
targets = torch.tensor([3.0, 2.0], dtype=torch.float32)
```
to
```
targets = torch.tensor([[3.0], [2.0]], dtype=torch.float32)
```
It worked as it should. This was obviously done with PyTorch, but I suspect maybe other libraries broadcast variables in the same way.
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...