According to this paper, the output shape is N + H - 1, N is input height or width, H is kernel height or width. This is obvious inver
This discussion is really helpful. Just add some additional information.
padding='SAME' can also let the bottom and right side get the one additional padded pixel. According to TensorFlow document, and the test case below
strides = [1, 2, 2, 1]
# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 12, 8, 2]
# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]
is using padding='SAME'. We can interpret padding='SAME' as:
(W−F+pad_along_height)/S+1 = out_height,
(W−F+pad_along_width)/S+1 = out_width.
So (12 - 3 + pad_along_height) / 2 + 1 = 6, and we get pad_along_height=1. And pad_top=pad_along_height/2 = 1/2 = 0(integer division), pad_bottom=pad_along_height-pad_top=1.
As for padding='VALID', as the name suggested, we use padding when it is proper time to use it. At first, we assume that the padded pixel = 0, if this doesn't work well, then we add 0 padding where any value outside the original input image region. For example, the test case below,
strides = [1, 2, 2, 1]
# Input, output: [batch, height, width, depth]
x_shape = [2, 6, 4, 3]
y_shape = [2, 13, 9, 2]
# Filter: [kernel_height, kernel_width, output_depth, input_depth]
f_shape = [3, 3, 2, 3]
The output shape of conv2d is
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
= ceil(float(13 - 3 + 1) / float(3)) = ceil(11/3) = 6
= (W−F)/S + 1.
Cause (W−F)/S+1 = (13-3)/2+1 = 6, the result is an integer, we don't need to add 0 pixels around the border of the image, and pad_top=1/2, pad_left=1/2 in the TensorFlow document padding='VALID' section are all 0.