I was looking at the Tensorflow MNIST example for beginners and found that in this part:
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(1
@dga gave a great answer, but I wanted to expand a little.
When I wrote the beginners tutorial, I implemented the cost function like so:
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
I wrote it that way because that looks most similar to the mathematical definition of cross-entropy. But it might actually be better to do something like this:
cross_entropy = -tf.reduce_mean(y_*tf.log(y))
Why might it be nicer to use a mean instead of a sum? Well, if we sum, then doubling the batch size doubles the cost, and also doubles the magnitude of the gradient. Unless we adjust our learning rate (or use an algorithm that adjusts it for us, like @dga suggested) our training will explode! But if we use a mean, then our learning rate becomes kind of independent of our batch size, which is nice.
I'd encourage you to check out Adam (tf.train.AdamOptimizer()). It's often more tolerant to fiddling with things than SGD.