Why are my TensorFlow network weights and costs NaN when I use RELU activations?

后端 未结 3 1636
天命终不由人
天命终不由人 2021-02-01 10:16

I can\'t get TensorFlow RELU activations (neither tf.nn.relu nor tf.nn.relu6) working without NaN values for activations and weights killing my trainin

3条回答
  •  灰色年华
    2021-02-01 11:22

    Have you tried gradient clipping and/or a smaller learning rate?

    Basically, you will need to process your gradients before applying them, as follows (from tf docs, mostly):

    # Replace this with what follows
    # opt = tf.train.MomentumOptimizer(0.02, momentum=0.5).minimize(cross_entropy_loss)
    
    # Create an optimizer.
    opt = tf.train.MomentumOptimizer(learning_rate=0.001, momentum=0.5)
    
    # Compute the gradients for a list of variables.
    grads_and_vars = opt.compute_gradients(cross_entropy_loss, tf.trainable_variables())
    
    # grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
    # need to the 'gradient' part, for example cap them, etc.
    capped_grads_and_vars = [(tf.clip_by_value(gv[0], -5., 5.), gv[1]) for gv in grads_and_vars]
    
    # Ask the optimizer to apply the capped gradients.
    opt = opt.apply_gradients(capped_grads_and_vars)
    

    Also, the discussion in this question might help.

提交回复
热议问题