what is the effect of tf.nn.conv2d() on an input tensor shape?

问题

I am studying tensorboard code from Dandelion Mane specificially: https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial/blob/master/mnist.py

His convolution layer is specifically defined as:

def conv_layer(input, size_in, size_out, name="conv"):
  with tf.name_scope(name):
    w = tf.Variable(tf.truncated_normal([5, 5, size_in, size_out], stddev=0.1), name="W")
    b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="B")
    conv = tf.nn.conv2d(input, w, strides=[1, 1, 1, 1], padding="SAME")
    act = tf.nn.relu(conv + b)
    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("activations", act)
    return tf.nn.max_pool(act, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

I am trying to work out what is the effect of the conv2d on the input tensor size. As far as I can tell it seems the first 3 dimensions are unchanged but the last dimension of the output follows the size of the last dimension of w.

For example, ?x47x36x64 input becomes ?x47x36x128 with w shape=5x5x64x128

And I also see that: ?x24x18x128 becomes ?x24x18x256 with w shape=5x5x128x256

So, is the resultant size for input: [a,b,c,d] the output size of [a,b,c,w.shape[3]]?

Would it be correct to think that the first dimension does not change?

回答1:

This works in your case because of the stride used and the padding applied. The output width and height will not always be the same as the input.

Check out this excellent discussion of the topic. The basic takeaway (taken almost verbatim from that link) is that a convolution layer:

Accepts an input volume of size W1 x H1 x D1
Requires four hyperparameters:
- Number of filters K
- Spatial extent of filters F
- The stride with which the filter moves S
- The amount of zero padding P
Produces a volume of size W2 x H2 x D2 where:
- W2 = (W1 - F + 2*P)/S + 1
- H2 = (H1 - F + 2*P)/S + 1
- D2 = K

And when you are processing batches of data in Tensorflow they typically have shape [batch_size, width, height, depth], so the first dimension which is just the number of samples in your batch should not change.

Note that the amount of padding P in the above is a little tricky with TF. When you give the padding='same' argument to tf.nn.conv2d, tensorflow applies zero padding to both sides of the image to make sure that no pixels of the image are ignored by your filter, but it may not add the same amount of padding to both sides (can differ by only one I think). This SO thread has some good discussion on the topic.

In general, with a stride S of 1 (which your network has), zero padding of P = (F - 1) / 2 will ensure that the output width/height equals the input, i.e. W2 = W1 and H2 = H1. In your case, F is 5, so tf.nn.conv2d must be adding two zeros to each side of the image for a P of 2, and your output width according to the above equation is W2 = (W1 - 5 + 2*2)/1 + 1 = W1 - 1 + 1 = W1.

来源：https://stackoverflow.com/questions/46105855/what-is-the-effect-of-tf-nn-conv2d-on-an-input-tensor-shape

标签

python

tensorflow

deep-learning

tensorboard