NaN loss when training regression network

后端 未结 17 2324
渐次进展
渐次进展 2020-11-29 16:28

I have a data matrix in \"one-hot encoding\" (all ones and zeros) with 260,000 rows and 35 columns. I am using Keras to train a simple neural network to predict a continuou

17条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-29 16:46

    To sum up the different solutions mentioned here and from this github discussion, which would depend of course on your particular situation:

    • Add regularization to add l1 or l2 penalties to the weights. Otherwise, try a smaller l2 reg. i.e l2(0.001), or remove it if already exists.
    • Try a smaller Dropout rate.
    • Clip the gradients to prevent their explosion. For instance in Keras you could use clipnorm=1. or clipvalue=1. as parameters for your optimizer.
    • Check validity of inputs (no NaNs or sometimes 0s). i.e df.isnull().any()
    • Replace optimizer with Adam which is easier to handle. Sometimes also replacing sgd with rmsprop would help.
    • Use RMSProp with heavy regularization to prevent gradient explosion.
    • Try normalizing your data, or inspect your normalization process for any bad values introduced.
    • Verify that you are using the right activation function (e.g. using a softmax instead of sigmoid for multiple class classification).
    • Try to increase the batch size (e.g. 32 to 64 or 128) to increase the stability of your optimization.
    • Try decreasing your learning rate.
    • Check the size of your last batch which may be different from the batch size.

提交回复
热议问题