Why does TensorFlow example fail when increasing batch size?

后端 未结 4 632
旧时难觅i
旧时难觅i 2020-12-03 05:25

I was looking at the Tensorflow MNIST example for beginners and found that in this part:

for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(1         


        
4条回答
  •  悲&欢浪女
    2020-12-03 06:08

    @dga gave a great answer, but I wanted to expand a little.

    When I wrote the beginners tutorial, I implemented the cost function like so:

    cross_entropy = -tf.reduce_sum(y_*tf.log(y))

    I wrote it that way because that looks most similar to the mathematical definition of cross-entropy. But it might actually be better to do something like this:

    cross_entropy = -tf.reduce_mean(y_*tf.log(y))

    Why might it be nicer to use a mean instead of a sum? Well, if we sum, then doubling the batch size doubles the cost, and also doubles the magnitude of the gradient. Unless we adjust our learning rate (or use an algorithm that adjusts it for us, like @dga suggested) our training will explode! But if we use a mean, then our learning rate becomes kind of independent of our batch size, which is nice.

    I'd encourage you to check out Adam (tf.train.AdamOptimizer()). It's often more tolerant to fiddling with things than SGD.

提交回复
热议问题