what is the effect of tf.nn.conv2d() on an input tensor shape?

本秂侑毒 提交于 2019-12-11 07:34:59

问题


I am studying tensorboard code from Dandelion Mane specificially: https://github.com/dandelionmane/tf-dev-summit-tensorboard-tutorial/blob/master/mnist.py

His convolution layer is specifically defined as:

def conv_layer(input, size_in, size_out, name="conv"):
  with tf.name_scope(name):
    w = tf.Variable(tf.truncated_normal([5, 5, size_in, size_out], stddev=0.1), name="W")
    b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="B")
    conv = tf.nn.conv2d(input, w, strides=[1, 1, 1, 1], padding="SAME")
    act = tf.nn.relu(conv + b)
    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("activations", act)
    return tf.nn.max_pool(act, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")

I am trying to work out what is the effect of the conv2d on the input tensor size. As far as I can tell it seems the first 3 dimensions are unchanged but the last dimension of the output follows the size of the last dimension of w.

For example, ?x47x36x64 input becomes ?x47x36x128 with w shape=5x5x64x128

And I also see that: ?x24x18x128 becomes ?x24x18x256 with w shape=5x5x128x256

So, is the resultant size for input: [a,b,c,d] the output size of [a,b,c,w.shape[3]]?

Would it be correct to think that the first dimension does not change?


回答1:


This works in your case because of the stride used and the padding applied. The output width and height will not always be the same as the input.

Check out this excellent discussion of the topic. The basic takeaway (taken almost verbatim from that link) is that a convolution layer:

  • Accepts an input volume of size W1 x H1 x D1
  • Requires four hyperparameters:
    • Number of filters K
    • Spatial extent of filters F
    • The stride with which the filter moves S
    • The amount of zero padding P
  • Produces a volume of size W2 x H2 x D2 where:
    • W2 = (W1 - F + 2*P)/S + 1
    • H2 = (H1 - F + 2*P)/S + 1
    • D2 = K

And when you are processing batches of data in Tensorflow they typically have shape [batch_size, width, height, depth], so the first dimension which is just the number of samples in your batch should not change.

Note that the amount of padding P in the above is a little tricky with TF. When you give the padding='same' argument to tf.nn.conv2d, tensorflow applies zero padding to both sides of the image to make sure that no pixels of the image are ignored by your filter, but it may not add the same amount of padding to both sides (can differ by only one I think). This SO thread has some good discussion on the topic.

In general, with a stride S of 1 (which your network has), zero padding of P = (F - 1) / 2 will ensure that the output width/height equals the input, i.e. W2 = W1 and H2 = H1. In your case, F is 5, so tf.nn.conv2d must be adding two zeros to each side of the image for a P of 2, and your output width according to the above equation is W2 = (W1 - 5 + 2*2)/1 + 1 = W1 - 1 + 1 = W1.



来源:https://stackoverflow.com/questions/46105855/what-is-the-effect-of-tf-nn-conv2d-on-an-input-tensor-shape

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!