I was running TensorFlow and I happen to have something yielding a NaN. I\'d like to know what it is but I do not know how to do this. The main issue is that in a \"normal\"
NANs occurring in the forward process are one thing and those occurring in the backward process are another.
Make sure that there are no extreme inputs such as NAN inputs or negative labels in the prepared dataset using NumPy tools, for instance: assert not np.any(np.isnan(x))
.
Switch to a CPU environment to get a more detailed traceback, and test the forward pass only by loss = tf.stop_gradient(loss)
before calculating the gradients to see if you can run several batches with no errors. If an error occurs, there are several types of potential bugs and methods:
tensor = tf.check_numerics(tensor, 'tensor')
in some suspicious places.tf_debug
as written in this answer.If everything goes well, remove the loss = tf.stop_gradient(loss)
.
As an aside, it's always helpful to make sure that the shape of every tensor is desired. You can try to input fixed-sized batches(drop the remainders) and reshape the feature tensors(where the graph receives data from Dataset) as you expect them to be(otherwise the first dimension would be None sometimes) and then print the shape of the very tensor in the graph with fixed numbers.