I can\'t get TensorFlow RELU activations (neither tf.nn.relu nor tf.nn.relu6) working without NaN values for activations and weights killing my trainin
Following He et. al (as suggested in lejlot's comment), initializing the weights of the l-th layer to a zero-mean Gaussian distribution with standard deviation
where nl is the flattened length of the the input vector or
stddev=np.sqrt(2 / np.prod(input_tensor.get_shape().as_list()[1:]))
results in weights that generally do not diverge.