Can't approximate simple multiplication function in neural network with 1 hidden layer

允我心安 提交于 2019-12-04 19:50:33

Big multiplication function gradient forces the net probably almost immediately into some horrifying state where all its hidden nodes have zero gradient. We can use two approaches:

1) Devide by constant. We are just deviding everything before the learning and multiply after.

2) Make log-normalization. It makes multiplication into addition:

m = x*y => ln(m) = ln(x) + ln(y).

Some things to check:

  1. Your output layer should have a linear activation function. If it's sigmoidal, it won't be able to represent values outside it's range (e.g. -1 to 1)
  2. You should use a loss function that's appropriate for regression (e.g. squared error)
  3. If your hidden layer uses sigmoidal activation functions, check that you're not saturating them. Multiplication can work on arbitrarily small/large values. And, if you pass a large number as input you can get saturation, which will lose information. If using ReLUs, make sure they're not getting stuck at 0 on all examples (although activations will generally be sparse on any given example).
  4. Check that your training procedure is working as intended. Plot the error over time during training. How does it look? Are your gradients well behaved or are they blowing up? One source of problems can be the learning rate being set too high (unstable error, exploding gradients) or too low (very slow progress, error doesn't decrease quickly enough).

m = x*y => ln(m) = ln(x) + ln(y), but only if x, y > 0

"Two approaches: divide by constant, or make log normalization"

I'm tried both approaches. Certainly, log normalization works since as you rightly point out it forces an implementation of addition. Dividing by constant -- or similarly normalizing across any range -- seems not to succeed in my extensive testing.

The log approach is fine, but if you have two datasets with a set of inputs and a target y value where:

  • In dataset one the target is consistently a sum of two of the inputs

  • In dataset two the target is consistently the product of two of the inputs

Then it's not clear to me how to design a neural network which will find the target y in both datasets using backpropogation. If this isn't possible, then I find it a surprising limitation in the ability of a neural network to find the "an approximation to any function". But I'm new to this game, and my expectations may be unrealistic.

Here is one way you could approximate the multiplication function using one hidden layer. It uses a sigmoidal activation in the hidden layer, and it works quite nicely until a certain range of numbers. This is the gist link

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!