问题
I have implemented a neural network (using CUDA) with 2 layers. (2 Neurons per layer). I'm trying to make it learn 2 simple quadratic polynomial functions using backpropagation.
But instead of converging, the it is diverging (the output is becoming infinity)
Here are some more details about what I've tried:
- I had set the initial weights to 0, but since it was diverging I have randomized the initial weights
- I read that a neural network might diverge if the learning rate is too high so I reduced the learning rate to 0.000001
- The two functions I am trying to get it to add are:
3 * i + 7 * j+9andj*j + i*i + 24(I am giving the layeriandjas input) - I had implemented it as a single layer previously and that could approximate the polynomial functions better
- I am thinking of implementing momentum in this network but I'm not sure it would help it learn
- I am using a linear (as in no) activation function
- There is oscillation in the beginning but the output starts diverging the moment any of weights become greater than 1
I have checked and rechecked my code but there doesn't seem to be any kind of issue with it.
So here's my question: what is going wrong here?
Any pointer will be appreciated.
回答1:
If the problem you are trying to solve is of classification type, try 3 layer network (3 is enough accordingly to Kolmogorov) Connections from inputs A and B to hidden node C (C = A*wa + B*wb) represent a line in AB space. That line divides correct and incorrect half-spaces. The connections from hidden layer to ouput, put hidden layer values in correlation with each other giving you the desired output.
Depending on your data, error function may look like a hair comb, so implementing momentum should help. Keeping learning rate at 1 proved optimum for me.
Your training sessions will get stuck in local minima every once in a while, so network training will consist of a few subsequent sessions. If session exceeds max iterations or amplitude is too high, or error is obviously high - the session has failed, start another.
At the beginning of each, reinitialize your weights with random (-0.5 - +0.5) values.
It really helps to chart your error descent. You will get that "Aha!" factor.
回答2:
The most common reason for a neural network code to diverge is that the coder has forgotten to put the negative sign in the change in weight expression.
another reason could be that there is a problem with the error expression used for calculating the gradients.
if these don't hold, then we need to see the code and answer.
来源:https://stackoverflow.com/questions/17985319/neural-network-diverging-instead-of-converging