I can\'t get TensorFlow RELU activations (neither tf.nn.relu
nor tf.nn.relu6
) working without NaN values for activations and weights killing my trainin
Following He et. al (as suggested in lejlot's comment), initializing the weights of the l-th layer to a zero-mean Gaussian distribution with standard deviation
where nl is the flattened length of the the input vector or
stddev=np.sqrt(2 / np.prod(input_tensor.get_shape().as_list()[1:]))
results in weights that generally do not diverge.