How Many Epochs Should a Neural Net Need to Learn to Square? (Testing Results Included)

不羁的心 提交于 2019-12-06 03:25:53

One of the simplest things you can do is calculating a XOR function. For testing "normal" multilayer perceptrons this is what I normally do. With a learning rate of 0.2 the XOR problem is solved perfectly (99% averaged accuracy) in less than 100 epochs with 2 - 5 - 1 neuron.

With a network (MLP) I have coded (tanh, no bias neuron but bias values for each neuron, weights initialized between 0.1 and 0.5, biases initialized with 0.5 each, 1.000 training data sets from 0.001 to 2.0 and activation normalization (input/activation of all but input layer neurons are divided by the amount of neurons in the parent layer), 1-5-1 neurons) I tried your problem and got a 95% averaged accuracy in less than 2.000 epochs every time with a learning rate of 0.1.

This can have several reasons. For my network 0.001 to 1.0 needed about twice the epochs to learn. Also the mentioned activation normalization (in most cases) reduces the needed epochs to learn a specific problem drastically.

In addition to that I had mostly positive experiences with bias values per neuron instead of one bias neuron per layer.

Furthermore if your learning rate is too high (and you do lots of epochs) you may risk running into overfitting.

This is a bit of necroposting, but I thought it would be nice to know for people new to neural networks.

For benchmarking neural networks and machine learning models in general, one common choice is the MONK dataset and its related paper by Thrun, Fahlman et al, which you can download at

http://robots.stanford.edu/papers/thrun.MONK.html

It consists of a set of three easy classification problems, each one is solved with a different machine learning model.

If you look at the neural network chapters, you can see how the input was encoded, which hyperparameters were set (such as number of neurons or learning rate), and what the results were, so you can easily benchmark your own implementation from there.

I think it's a bit more robust than the XOR problem (I talk from experience, since when I was first implementing a neural network, my faulty implementation happened to solve the XOR problem but not the MONK problems).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!