XOR Neural Network(FF) converges to 0.5

问题

I've created a program that allows me to create flexible Neural networks of any size/length, however I'm testing it using the simple structure of an XOR setup(Feed forward, Sigmoid activation, back propagation, no batching).

EDIT: The following is a completely new approach to my original question which didn't supply enough information

EDIT 2: I started my weight between -2.5 and 2.5, and fixed a problem in my code where I forgot some negatives. Now it either converges to 0 for all cases or 1 for all, instead of 0.5

Everything works exactly the way that I THINK it should, however it is converging toward 0.5, instead of oscillating between outputs of 0 and 1. I've completely gone through and hand calculated an entire setup of feeding forward/calculating delta errors/back prop./ etc. and it matched what I got from the program. I have also tried optimizing it by changing learning rate/ momentum, as well as increase complexity in the network(more neurons/layers).

Because of this, I assume that either one of my equations is wrong, or I have some other sort of misunderstanding in my Neural Network. The following is the logic with equations that I follow for each step:

I have an input layer with two inputs and a bias, a hidden with 2 neurons and a bias, and an output with 1 neuron.

Take the input from each of the two input neurons and the bias neuron, then multiply them by their respective weights, and then add them together as the input for each of the two neurons in the hidden layer.
Take the input of each hidden neuron, pass it through the Sigmoid activation function (Reference 1) and use that as the neuron's output.
Take the outputs of each neuron in hidden layer (1 for the bias), multiply them by their respective weights, and add those values to the output neuron's input.
Pass the output neuron's input through the Sigmoid activation function, and use that as the output for the whole network.
Calculate the Delta Error(Reference 2) for the output neuron
Calculate the Delta Error(Reference 3) for each of the 2 hidden neurons
Calculate the Gradient(Reference 4) for each weight (starting from the end and working back)
Calculate the Delta Weight(Reference 5) for each weight, and add that to its value.
Start the process over with by Changing the inputs and expected output(Reference 6)

Here are the specifics of those references to equations/processes (This is probably where my problem is!):

x is the input of the neuron: (1/(1 + Math.pow(Math.E, (-1 * x))))
-1*(actualOutput - expectedOutput)*(Sigmoid(x) * (1 - Sigmoid(x))//Same sigmoid used in reference 1
SigmoidDerivative(Neuron.input)*(The sum of(Neuron.Weights * the deltaError of the neuron they connect to))
ParentNeuron.output * NeuronItConnectsTo.deltaError
learningRate*(weight.gradient) + momentum*(Previous Delta Weight)
I have an arrayList with the values 0,1,1,0 in it in that order. It takes the first pair(0,1), and then expects a 1. For the second time through, it takes the second pair(1,1) and expects a 0. It just keeps iterating through the list for each new set. Perhaps training it in this systematic way causes the problem?

Like I said before, they reason I don't think it's a code problem is because it matched exactly what I had calculated with paper and pencil (which wouldn't have happened if there was a coding error).

Also when I initialize my weights the first time, I give them a random double value between 0 and 1. This article suggests that that may lead to a problem: Neural Network with backpropogation not converging Could that be it? I used the n^(-1/2) rule but that did not fix it.

If I can be more specific or you want other code let me know, thanks!

回答1:

This is wrong

SigmoidDerivative(Neuron.input)*(The sum of(Neuron.Weights * the deltaError of the neuron they connect to)) First is sigmoid activation (g) second is derivative of sigmoid activation

private double g(double z) {
    return 1 / (1 + Math.pow(2.71828, -z));
}

private double gD(double gZ) {
    return gZ * (1 - gZ);
}

Unrelated note: Your notation of (-1*x) is really strange just use -x

Your implementation from how you phrase the steps of your ANN seems poor. Try to focus on implementing Forward/BackPropogation and then an UpdateWeights method. Creating a matrix class

This is my Java implementation, its very simple and somewhat rough. I use a Matrix class to make the math behind it appear very simple in code.

If you can code in C++ you can overload operaters which will enable for even easier writing of comprehensible code.

https://github.com/josephjaspers/ArtificalNetwork/blob/master/src/artificalnetwork/ArtificalNetwork.java

Here are the algorithms (C++)

All of these codes can be found on my github (the Neural nets are simple and funcitonal) Each layer includes the bias nodes, which is why there are offsets

void NeuralNet::forwardPropagation(std::vector<double> data) {
    setBiasPropogation(); //sets all the bias nodes activation to 1
    a(0).set(1, Matrix(data)); //1 to offset for bias unit (A = X)

    for (int i = 1; i < layers; ++i) {
        //  (set(1 -- offsets the bias unit

        z(i).set(1, w(i - 1) * a(i - 1)); 
        a(i) = g(z(i)); // g(z ) if the sigmoid function
    }
}
void NeuralNet::setBiasPropogation() {
    for (int i = 0; i < activation.size(); ++i) {
        a(i).set(0, 0, 1);
    }
}

outLayer D = A - Y (y is the output data) hiddenLayers d^l = (w^l(T) * d^l+1) *: gD(a^l)

d = derivative vector

W = weights matrix (Length = connections, width = features)

a = activation matrix

gD = derivative function

^l = IS NOT POWER OF (this just means at layer l)

= dotproduct

*: = multiply (multiply each element "through")

cpy(n) returns a copy of the matrix offset by n (ignores n rows)

void NeuralNet::backwardPropagation(std::vector<double> output) {
    d(layers - 1) = a(layers - 1) - Matrix(output);
    for (int i = layers - 2; i > -1; --i) {
    d(i) = (w(i).T() * d(i + 1).cpy(1)).x(gD(a(i))); 
    }       
}

Explaining this code maybe confusing without images so I'm sending this link which I think is a good source, it also contains an explanation of BackPropagation which may be better then my own explanation. http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html

void NeuralNet::updateWeights() {
    // the operator () (int l, int w) returns a double reference at that position in the matrix
    // thet operator [] (int n) returns the nth double (reference) in the matrix (useful for vectors) 
    for (int l = 0; l < layers - 1; ++l) {
        for (int i = 1; i < d(l + 1).length(); ++i) {
            for (int j = 0; j < a(l).length(); ++j) {
                w(l)(i - 1, j) -= (d(l + 1)[i] * a(l)[j]) * learningRate + m(l)(i - 1, j);
                m(l)(i - 1, j) = (d(l + 1)[i] * a(l)[j]) * learningRate * momentumRate;
            }
        }
    }
}

来源：https://stackoverflow.com/questions/39523744/xor-neural-networkff-converges-to-0-5

标签

java

neural-network

xor