问题
I just wanted to test how good can neural network approximate multiplication function (regression task). I am using Azure Machine Learning Studio. I have 6500 samples, 1 hidden layer (I have tested 5 /30 /100 neurons per hidden layer), no normalization. And default parameters Learning rate - 0.005, Number of learning iterations - 200, The initial learning weigh - 0.1, The momentum - 0 [description]. I got extremely bad accuracy, close to 0. At the same time boosted Decision forest regression shows very good approximation.
What am I doing wrong? This task should be very easy for NN.
回答1:
Big multiplication function gradient forces the net probably almost immediately into some horrifying state where all its hidden nodes have zero gradient. We can use two approaches:
1) Devide by constant. We are just deviding everything before the learning and multiply after.
2) Make log-normalization. It makes multiplication into addition:
m = x*y => ln(m) = ln(x) + ln(y).
回答2:
Some things to check:
- Your output layer should have a linear activation function. If it's sigmoidal, it won't be able to represent values outside it's range (e.g. -1 to 1)
- You should use a loss function that's appropriate for regression (e.g. squared error)
- If your hidden layer uses sigmoidal activation functions, check that you're not saturating them. Multiplication can work on arbitrarily small/large values. And, if you pass a large number as input you can get saturation, which will lose information. If using ReLUs, make sure they're not getting stuck at 0 on all examples (although activations will generally be sparse on any given example).
- Check that your training procedure is working as intended. Plot the error over time during training. How does it look? Are your gradients well behaved or are they blowing up? One source of problems can be the learning rate being set too high (unstable error, exploding gradients) or too low (very slow progress, error doesn't decrease quickly enough).
回答3:
m = x*y => ln(m) = ln(x) + ln(y), but only if x, y > 0
回答4:
"Two approaches: divide by constant, or make log normalization"
I'm tried both approaches. Certainly, log normalization works since as you rightly point out it forces an implementation of addition. Dividing by constant -- or similarly normalizing across any range -- seems not to succeed in my extensive testing.
The log approach is fine, but if you have two datasets with a set of inputs and a target y value where:
In dataset one the target is consistently a sum of two of the inputs
In dataset two the target is consistently the product of two of the inputs
Then it's not clear to me how to design a neural network which will find the target y in both datasets using backpropogation. If this isn't possible, then I find it a surprising limitation in the ability of a neural network to find the "an approximation to any function". But I'm new to this game, and my expectations may be unrealistic.
回答5:
Here is one way you could approximate the multiplication function using one hidden layer. It uses a sigmoidal activation in the hidden layer, and it works quite nicely until a certain range of numbers. This is the gist link
来源:https://stackoverflow.com/questions/37520849/cant-approximate-simple-multiplication-function-in-neural-network-with-1-hidden