Why are my TensorFlow network weights and costs NaN when I use RELU activations?

后端 未结 3 1635
天命终不由人
天命终不由人 2021-02-01 10:16

I can\'t get TensorFlow RELU activations (neither tf.nn.relu nor tf.nn.relu6) working without NaN values for activations and weights killing my trainin

3条回答
  •  灰色年华
    2021-02-01 10:56

    Following He et. al (as suggested in lejlot's comment), initializing the weights of the l-th layer to a zero-mean Gaussian distribution with standard deviation

    where nl is the flattened length of the the input vector or

    stddev=np.sqrt(2 / np.prod(input_tensor.get_shape().as_list()[1:]))
    

    results in weights that generally do not diverge.

提交回复
热议问题