I was looking at the Tensorflow MNIST example for beginners and found that in this part:
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(1
@dga nicely explained you the reason of such behavior (the cross_entropy becomes too huge) and thus the algorithm will not be able to converge. There are a couple of ways to fix this. He already suggested to decrease the learning rate.
Gradient descent is the most basic algorithm. Almost all other optimizers will be working properly:
train_step = tf.train.AdagradOptimizer(0.01).minimize(cross_entropy)
train_step = tf.train.AdamOptimizer().minimize(cross_entropy)
train_step = tf.train.FtrlOptimizer(0.01).minimize(cross_entropy)
train_step = tf.train.RMSPropOptimizer(0.01, 0.1).minimize(cross_entropy)
Another approach is to use tf.nn.softmax_cross_entropy_with_logits which handles numeric instabilities.