How does one debug NaN values in TensorFlow?

后端未结

关注

 9  1420

遥遥无期 2020-12-23 09:07

I was running TensorFlow and I happen to have something yielding a NaN. I\'d like to know what it is but I do not know how to do this. The main issue is that in a \"normal\"

9条回答

春和景丽 (楼主)

2020-12-23 10:09
NANs occurring in the forward process are one thing and those occurring in the backward process are another.

Step 0: data

Make sure that there are no extreme inputs such as NAN inputs or negative labels in the prepared dataset using NumPy tools, for instance: assert not np.any(np.isnan(x)).

Step 1: the forward

Switch to a CPU environment to get a more detailed traceback, and test the forward pass only by loss = tf.stop_gradient(loss) before calculating the gradients to see if you can run several batches with no errors. If an error occurs, there are several types of potential bugs and methods:
1. 0 in the log for the cross-entropy loss functions(please refer to this answer)
2. 0/0 problem
3. out of class problem as issued here.
4. try tensor = tf.check_numerics(tensor, 'tensor') in some suspicious places.
5. try tf_debug as written in this answer.
Step 2: the backward

If everything goes well, remove the loss = tf.stop_gradient(loss).
1. try very small learning rate
2. replace complex blocks of code by simple computations, like full connection, with the same shape of inputs and outputs to zoom in where the bug lies. You may encounter backward bugs like this.
As an aside, it's always helpful to make sure that the shape of every tensor is desired. You can try to input fixed-sized batches(drop the remainders) and reshape the feature tensors(where the graph receives data from Dataset) as you expect them to be(otherwise the first dimension would be None sometimes) and then print the shape of the very tensor in the graph with fixed numbers.
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

How does one debug NaN values in TensorFlow?

Step 0: data

Step 1: the forward

Step 2: the backward