Why are my TensorFlow network weights and costs NaN when I use RELU activations?

后端未结

关注

 3  1635

天命终不由人 2021-02-01 10:16

I can\'t get TensorFlow RELU activations (neither tf.nn.relu nor tf.nn.relu6) working without NaN values for activations and weights killing my trainin

3条回答

灰色年华 (楼主)

2021-02-01 10:56
Following He et. al (as suggested in lejlot's comment), initializing the weights of the l-th layer to a zero-mean Gaussian distribution with standard deviation

$\sqrt{\frac{2}{n_l}}$

where n_l is the flattened length of the the input vector or
```
stddev=np.sqrt(2 / np.prod(input_tensor.get_shape().as_list()[1:]))
```
results in weights that generally do not diverge.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...