NaN loss when training regression network

后端未结

关注

 17  2324

渐次进展 2020-11-29 16:28

I have a data matrix in \"one-hot encoding\" (all ones and zeros) with 260,000 rows and 35 columns. I am using Keras to train a simple neural network to predict a continuou

17条回答

谎友^ (楼主)

2020-11-29 16:46
To sum up the different solutions mentioned here and from this github discussion, which would depend of course on your particular situation:
- Add regularization to add l1 or l2 penalties to the weights. Otherwise, try a smaller l2 reg. i.e l2(0.001), or remove it if already exists.
- Try a smaller Dropout rate.
- Clip the gradients to prevent their explosion. For instance in Keras you could use clipnorm=1. or clipvalue=1. as parameters for your optimizer.
- Check validity of inputs (no NaNs or sometimes 0s). i.e df.isnull().any()
- Replace optimizer with Adam which is easier to handle. Sometimes also replacing sgd with rmsprop would help.
- Use RMSProp with heavy regularization to prevent gradient explosion.
- Try normalizing your data, or inspect your normalization process for any bad values introduced.
- Verify that you are using the right activation function (e.g. using a softmax instead of sigmoid for multiple class classification).
- Try to increase the batch size (e.g. 32 to 64 or 128) to increase the stability of your optimization.
- Try decreasing your learning rate.
- Check the size of your last batch which may be different from the batch size.
0 讨论(0)

查看其它17个回答
发布评论:

提交评论
- 加载中...