I have a small, 3 layer, neural network with two input neurons, two hidden neurons and one output neuron. I am trying to stick to the below format of using only 2 hidden neurons
This is due to the fact that you have not considered any bias
for the neurons.
You have only used weights to try and fit the XOR
model.
Incase of 2 neurons in the hidden layer, the network under-fits as it can't compensate for the bias.
When you use 3 neurons in the hidden layer, the extra neuron counters the effect caused due to the lack of bias.
This is an example of a network for XOR gate. You'll notice theta
(bias) added to the hidden layers. This gives the network an additional parameter to tweak.
Additional resources