Why are my TensorFlow network weights and costs NaN when I use RELU activations?

后端未结

关注

 3  1636

天命终不由人 2021-02-01 10:16

I can\'t get TensorFlow RELU activations (neither tf.nn.relu nor tf.nn.relu6) working without NaN values for activations and weights killing my trainin

3条回答

灰色年华 (楼主)

2021-02-01 11:22

Have you tried gradient clipping and/or a smaller learning rate?

Basically, you will need to process your gradients before applying them, as follows (from tf docs, mostly):

# Replace this with what follows
# opt = tf.train.MomentumOptimizer(0.02, momentum=0.5).minimize(cross_entropy_loss)

# Create an optimizer.
opt = tf.train.MomentumOptimizer(learning_rate=0.001, momentum=0.5)

# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(cross_entropy_loss, tf.trainable_variables())

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(tf.clip_by_value(gv[0], -5., 5.), gv[1]) for gv in grads_and_vars]

# Ask the optimizer to apply the capped gradients.
opt = opt.apply_gradients(capped_grads_and_vars)

Also, the discussion in this question might help.

0 讨论(0)

查看其它3个回答