How did they calculate the output volume for this convnet example in Caffe?

问题

In this tutorial, the output volumes are stated in output [25], and the receptive fields are specified in output [26].

Okay, the input volume [3, 227, 227] gets convolved with the region of size [3, 11, 11].

Using this formula (W−F+2P)/S+1, where:
W = the input volume size
F = the receptive field size
P = padding
S = stride

...results with (227 - 11)/4 + 1 = 55 i.e. [55*55*96]. So far so good :)

For 'pool1' they used F=3and S=2 I think? The calculation checks out: 55-3/2+1=27.

From this point I get a bit confused. The receptive field for the second convnet layer is [48, 5, 5], yet the output for 'conv2' is equal to [256, 27, 27]. What calculation happened here?

And then, the height and width of the output volumes of 'conv3' to 'conv4' are all the same [13, 13]? What's going on?

Thanks!

回答1:

If you look closely at the parameters of conv2 layer you'll notice

   pad: 2

That is, the input blob is padded by 2 extra pixels all around, thus the formula now is

27 + 2 + 2 - ( 5 - 1 ) = 27

Padding a kernel size of 5 with 2 pixels from both sides yields the same output size.

来源：https://stackoverflow.com/questions/32979683/how-did-they-calculate-the-output-volume-for-this-convnet-example-in-caffe

标签

machine-learning

neural-network

convolution

deep-learning

caffe

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!