Here is the code that I am using. I\'m trying to get a 1, 0, or hopefully a probability in result to a real test set. When I just split up the training set and run it on the
tf_cross_entropy = -tf.reduce_sum(tf_softmax_correct*tf.log(tf_softmax))
This was my problem on a project I was testing on. Specificaly it ended up being 0*log(0) which produces nan.
If you replace this with:
tf_cross_entropy = -tf.reduce_sum(tf_softmax_correct*tf.log(tf_softmax + 1e-50))
It should avoid the problem.
Ive also used reduce_mean rather than reduce_sum. If you double the batch size and use reduce_sum it will double the cost (and the magnitude of the gradient). In addition to that when using tf.print (which prints to the console tensorfow was started from) it makes it a bit more comparable when varying batch size.
Specifically this is what I'm using now when debugging:
cross_entropy = -tf.reduce_sum(y*tf.log(model + 1e-50)) ## avoid nan due to 0*log(0)
cross_entropy = tf.Print(cross_entropy, [cross_entropy], "cost") #print to the console tensorflow was started from