Neural Network Always Produces Same/Similar Outputs for Any Input

前端 未结 11 1927
猫巷女王i
猫巷女王i 2020-12-13 04:24

I have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same out

相关标签:
11条回答
  • 2020-12-13 04:33

    I faced a similar issue earlier when my data was not properly normalized. Once I normalized the data everything ran correctly.

    Recently, I faced this issue again and after debugging, I found that there can be another reason for neural networks giving the same output. If you have a neural network that has a weight decay term such as that in the RSNNS package, make sure that your decay term is not so large that all weights go to essentially 0.

    I was using the caret package for in R. Initially, I was using a decay hyperparameter = 0.01. When I looked at the diagnostics, I saw that the RMSE was being calculated for each fold (of cross validation), but the Rsquared was always NA. In this case all predictions were coming out to the same value.

    Once I reduced the decay to a much lower value (1E-5 and lower), I got the expected results.

    I hope this helps.

    0 讨论(0)
  • 2020-12-13 04:34

    Based on your comments, I'd agree with @finnw that you have a bias problem. You should treat the bias as a constant "1" (or -1 if you prefer) input to each neuron. Each neuron will also have its own weight for the bias, so a neuron's output should be the sum of the weighted inputs, plus the bias times its weight, passed through the activation function. Bias weights are updated during training just like the other weights.

    Fausett's "Fundamentals of Neural Networks" (p.300) has an XOR example using binary inputs and a network with 2 inputs, 1 hidden layer of 4 neurons and one output neuron. Weights are randomly initialized between +0.5 and -0.5. With a learning rate of 0.02 the example network converges after about 3000 epochs. You should be able to get a result in the same ballpark if you get the bias problems (and any other bugs) ironed out.

    Also note that you cannot solve the XOR problem without a hidden layer in your network.

    0 讨论(0)
  • 2020-12-13 04:34

    I haven't tested it with the XOR problem in the question, but for my original dataset based on Tic-Tac-Toe, I believe that I have gotten the network to train somewhat (I only ran 1000 epochs, which wasn't enough): the quickpropagation network can win/tie over half of its games; backpropagation can get about 41%. The problems came down to implementation errors (small ones) and not understanding the difference between the error derivative (which is per-weight) and the error for each neuron, which I did not pick up on in my research. @darkcanuck's answer about training the bias similarly to a weight would probably have helped, though I didn't implement it. I also rewrote my code in Python so that I could more easily hack with it. Therefore, although I haven't gotten the network to match the minimax algorithm's efficiency, I believe that I have managed to solve the problem.

    0 讨论(0)
  • 2020-12-13 04:40

    It's hard to tell without seeing a code sample, but a bias bug can have that effect (e.g. forgetting to add the bias to the input), so I would take a closer look at that part of the code.

    0 讨论(0)
  • 2020-12-13 04:40

    For me it was happening exactly like in your case, the output of neural network was always the same no matter the training & number of layers etc.

    Turns out my back-propagation algorithm had a problem. At one place I was multiplying by -1 where it wasn't required.

    There could be another problem like this. The question is how to debug it?

    Steps to debug:

    Step1 : Write the algorithm such that it can take variable number of input layers and variable number of input & output nodes.
    Step2 : Reduce the hidden layers to 0. Reduce input to 2 nodes, output to 1 node.
    Step3 : Now train for binary-OR-Operation.
    Step4 : If it converges correctly, go to Step 8.
    Step5 : If it doesn't converge, train it only for 1 training sample
    Step6 : Print all the forward and prognostication variables (weights, node-outputs, deltas etc)
    Step7 : Take pen&paper and calculate all the variables manually.
    Step8 : Cross verify the values with algorithm.
    Step9 : If you don't find any problem with 0 hidden layers. Increase hidden layer size to 1. Repeat step 5,6,7,8
    

    It sounds like a lot of work, but it works very well IMHO.

    0 讨论(0)
  • 2020-12-13 04:42

    So I realise this is extremely late for the original post, but I came across this because I was having a similar problem and none of the reasons posted here cover what was wrong in my case.

    I was working on a simple regression problem, but every time I trained the network it would converge to a point where it was giving me the same output (or sometimes a few different outputs) for each input. I played with the learning rate, the number of hidden layers/nodes, the optimization algorithm etc but it made no difference. Even when I looked at a ridiculously simple example, trying to predict the output (1d) of two different inputs (1d):

        import numpy as np
        import torch
        import torch.nn as nn
        import torch.nn.functional as F
    
        class net(nn.Module):
            def __init__(self, obs_size, hidden_size):
                super(net, self).__init__()
                self.fc = nn.Linear(obs_size, hidden_size)
                self.out = nn.Linear(hidden_size, 1)
    
            def forward(self, obs):
                h = F.relu(self.fc(obs))
                return self.out(h)
    
        inputs = np.array([[0.5],[0.9]])
        targets = torch.tensor([3.0, 2.0], dtype=torch.float32)
    
        network = net(1,5)
        optimizer = torch.optim.Adam(network.parameters(), lr=0.001)
    
        for i in range(10000):
            out = network(torch.tensor(inputs, dtype=torch.float32))
            loss = F.mse_loss(out, targets)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            print("Loss: %f outputs: %f, %f"%(loss.data.numpy(), out.data.numpy()[0], out.data.numpy()[1]))
    

    but STILL it was always outputting the average value of the outputs for both inputs. It turns out the reason is that the dimensions of my outputs and targets were not the same: the targets were Size[2], and the outputs were Size[2,1], and for some reason PyTorch was broadcasting the outputs to be Size[2,2] in the MSE loss, which completely messes everything up. Once I changed:

    targets = torch.tensor([3.0, 2.0], dtype=torch.float32)
    

    to

    targets = torch.tensor([[3.0], [2.0]], dtype=torch.float32)
    

    It worked as it should. This was obviously done with PyTorch, but I suspect maybe other libraries broadcast variables in the same way.

    0 讨论(0)
提交回复
热议问题