问题
In this tutorial, the output volumes are stated in output [25], and the receptive fields are specified in output [26].
Okay, the input volume [3, 227, 227]
gets convolved with the region of size [3, 11, 11]
.
Using this formula (W−F+2P)/S+1
, where:W
= the input volume sizeF
= the receptive field sizeP
= paddingS
= stride
...results with (227 - 11)/4 + 1 = 55
i.e. [55*55*96]. So far so good :)
For 'pool1' they used F=3
and S=2
I think? The calculation checks out: 55-3/2+1=27
.
From this point I get a bit confused. The receptive field for the second convnet layer is [48, 5, 5]
, yet the output for 'conv2' is equal to [256, 27, 27]
. What calculation happened here?
And then, the height and width of the output volumes of 'conv3' to 'conv4' are all the same [13, 13]
? What's going on?
Thanks!
回答1:
If you look closely at the parameters of conv2 layer you'll notice
pad: 2
That is, the input blob is padded by 2 extra pixels all around, thus the formula now is
27 + 2 + 2 - ( 5 - 1 ) = 27
Padding a kernel size of 5
with 2
pixels from both sides yields the same output size.
来源:https://stackoverflow.com/questions/32979683/how-did-they-calculate-the-output-volume-for-this-convnet-example-in-caffe