可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?
In my opinion, 'VALID' means there will be no zero padding outside the edges when we do max pool.
According to A guide to convolution arithmetic for deep learning, it says that there will be no padding in pool operator, i.e. just use 'VALID' of tensorflow. But what is 'SAME' padding of max pool in tensorflow?
回答1:
I'll give an example to make it clearer:
x: input image of shape [2, 3], 1 channel valid_pad: max pool with 2x2 kernel, stride 2 and VALID padding. same_pad: max pool with 2x2 kernel, stride 2 and SAME padding (this is the classic way to go)
The output shapes are:
valid_pad: here, no padding so the output shape is [1, 1] same_pad: here, we pad the image to the shape [2, 4] (with -inf and then apply max pool), so the output shape is [1, 2]
x = tf.constant([[1., 2., 3.], [4., 5., 6.]]) x = tf.reshape(x, [1, 2, 3, 1]) # give a shape accepted by tf.nn.max_pool valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID') same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME') valid_pad.get_shape() == [1, 1, 1, 1] # valid_pad is [5.] same_pad.get_shape() == [1, 1, 2, 1] # same_pad is [5., 6.]
回答2:
If you like ascii art:
"VALID" = without padding:
inputs: 1 2 3 4 5 6 7 8 9 10 11 (12 13) |________________| dropped |_________________|
"SAME" = with zero padding:
pad| |pad inputs: 0 |1 2 3 4 5 6 7 8 9 10 11 12 13|0 0 |________________| |_________________| |________________|
In this example:
- Input width = 13
- Filter width = 6
- Stride = 5
Notes:
"VALID" only ever drops the right-most columns (or bottom-most rows). "SAME" tries to pad evenly left and right, but if the amount of columns to be added is odd, it will add the extra column to the right, as is the case in this example (the same logic applies vertically: there may be an extra row of zeros at the bottom).
回答3:
The TensorFlow Convolution example gives an overview about the difference between SAME and VALID :
- For the
SAME padding, the output height and width are computed as:
out_height = ceil(float(in_height) / float(strides[1]))
out_width = ceil(float(in_width) / float(strides[2]))
And
- For the
VALID padding, the output height and width are computed as:
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))
回答4:
When stride is 1 (more typical with convolution than pooling), we can think of the following distinction:
"SAME": output size is the same than input size. This requires the filter window to slip outside input map, hence the need to pad. "VALID": Filter window stays at valid position inside input map, so output size shrinks by filter_size - 1. No padding occurs.
回答5:
Padding is an operation to increase the size of the input data. In case of 1-dimensional data you just append/prepend the array with a constant, in 2-dim you surround matrix with these constants. In n-dim you surround your n-dim hypercube with the constant. In most of the cases this constant is zero and it is called zero-padding.
Here is an example of zero-padding with p=1 applied to 2-d tensor: 
You can use arbitrary padding for your kernel but some of the padding values are used more frequently than others they are:
- VALID padding. The easiest case, means no padding at all. Just leave your data the same it was.
- SAME padding sometimes called HALF padding. It is called SAME because for a convolution with a stride=1, (or for pooling) it should produce output of the same size as the input. It is called HALF because for a kernel of size
k 
- FULL padding is the maximum padding which does not result in a convolution over just padded elements. For a kernel of size
k, this padding is equal to k - 1.
To use arbitrary padding in TF, you can use tf.pad()
回答6:
There are three choices of padding: valid (no padding), same (or half), full. You can find explanations (in Theano) here: http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
The valid padding involves no zero padding, so it covers only the valid input, not including artificially generated zeros. The length of output is ((the length of input) - (k-1)) for the kernel size k if the stride s=1.
The same padding makes the size of outputs be the same with that of inputs when s=1. If s=1, the number of zeros padded is (k-1).
The full padding means that the kernel runs over the whole inputs, so at the ends, the kernel may meet the only one input and zeros else. The number of zeros padded is 2(k-1) if s=1. The length of output is ((the length of input) + (k-1)) if s=1.
Therefore, the number of paddings: (valid)
回答7:
Based on the explanation here and following up on Tristan's answer, I usually use these quick functions for sanity checks.
# a function to help us stay clean def getPaddings(pad_along_height,pad_along_width): # if even.. easy.. if pad_along_height%2 == 0: pad_top = pad_along_height / 2 pad_bottom = pad_top # if odd else: pad_top = np.floor( pad_along_height / 2 ) pad_bottom = np.floor( pad_along_height / 2 ) +1 # check if width padding is odd or even # if even.. easy.. if pad_along_width%2 == 0: pad_left = pad_along_width / 2 pad_right= pad_left # if odd else: pad_left = np.floor( pad_along_width / 2 ) pad_right = np.floor( pad_along_width / 2 ) +1 # return pad_top,pad_bottom,pad_left,pad_right # strides [image index, y, x, depth] # padding 'SAME' or 'VALID' # bottom and right sides always get the one additional padded pixel (if padding is odd) def getOutputDim (inputWidth,inputHeight,filterWidth,filterHeight,strides,padding): if padding == 'SAME': out_height = np.ceil(float(inputHeight) / float(strides[1])) out_width = np.ceil(float(inputWidth) / float(strides[2])) # pad_along_height = ((out_height - 1) * strides[1] + filterHeight - inputHeight) pad_along_width = ((out_width - 1) * strides[2] + filterWidth - inputWidth) # # now get padding pad_top,pad_bottom,pad_left,pad_right = getPaddings(pad_along_height,pad_along_width) # print 'output height', out_height print 'output width' , out_width print 'total pad along height' , pad_along_height print 'total pad along width' , pad_along_width print 'pad at top' , pad_top print 'pad at bottom' ,pad_bottom print 'pad at left' , pad_left print 'pad at right' ,pad_right elif padding == 'VALID': out_height = np.ceil(float(inputHeight - filterHeight + 1) / float(strides[1])) out_width = np.ceil(float(inputWidth - filterWidth + 1) / float(strides[2])) # print 'output height', out_height print 'output width' , out_width print 'no padding' # use like so getOutputDim (80,80,4,4,[1,1,1,1],'SAME')
回答8:
I am quoting this answer from official tensorflow docs https://www.tensorflow.org/api_guides/python/nn#Convolution For the 'SAME' padding, the output height and width are computed as:
out_height = ceil(float(in_height) / float(strides[1])) out_width = ceil(float(in_width) / float(strides[2]))
and the padding on the top and left are computed as:
pad_along_height = max((out_height - 1) * strides[1] + filter_height - in_height, 0) pad_along_width = max((out_width - 1) * strides[2] + filter_width - in_width, 0) pad_top = pad_along_height // 2 pad_bottom = pad_along_height - pad_top pad_left = pad_along_width // 2 pad_right = pad_along_width - pad_left
For the 'VALID' padding, the output height and width are computed as:
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1])) out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))
and the padding values are always zero.
回答9:
Quick Explanation
VALID: Don't apply any padding, i.e., assume that all dimensions are valid so that input image fully gets covered by filter and stride you specified.
SAME: Apply padding to input (if needed) so that input image gets fully covered by filter and stride you specified. For stride 1, this will ensure that output image size is same as input.
Notes
- This applies to coov layers as well as max pool layers in same way
- The term "valid" is bit of a misnomer because things don't become "invalid" if you drop part of the image. Sometime you might even want that. This should have probably be called "NO_PADDING" instead.
- The term "same" is a misnomer too because it only makes sense for stride of 1 when output dimension is same as input dimension. For stride of 2, output dimensions will be half, for example. This should have probably be called "AUTO_PADDING" instead.
- In
SAME (i.e. auto-pad mode), Tensorflow will try to spread padding evenly on both left and right. - In
VALID (i.e. no padding mode), Tensorflow will drop right and/or bottom cells if your filter and stride doesn't full cover input image.