How to get properly the output's shape when converting code from tf.nn.conv2d_transpose to tf.keras.layers.Conv2dTransopose

问题

I am converting some codes from tensorflow to tf.keras. I am used to upsample using tf.nn.conv2d_transpose which receive a output_shape parameter and everything works fine. When I swap into keras models I start using the Conv2DTranspose layer, but I can't get the desired output shape.

For sake of simplicity lets assume my input shape is (38,25), then I use maxpooling operation with kernel shape=(2,2) and strides=(2,2) which output a (19,13) volume. Now is when the 'deconvolution' is applied with kernel_size=(4,4), padding='same' and stride=(2,2) which obviously return a (38,26) volume.

I was trying all combinations of padding and output_padding and there's no way to get a volume of (38,25) as I did using tf.nn.conv2d_transpose(output_shape=(38,25)).

This sample code shows indeed what I mean:

import tensorflow as tf
import numpy as np 
from tensorflow.keras.layers import Conv2DTranspose, Input, MaxPool2D, ZeroPadding2D
from tensorflow.keras import Model

I = Input(shape=(38,25,3))
X = MaxPool2D(pool_size=(2,2),strides=(2,2),padding='same')(I)
Y = []
#getting all possible outputs using different padding and output_padding
for padding in ['same','valid']:
    for i in range(2): # [0,..,stride-1]
        for j in range(2):
            Y.append(Conv2DTranspose(3,(4, 4),strides=(2,2), padding=padding,\
                            name=padding+str(i)+str(j),use_bias=False, output_padding=(i,j))(X))
    Y.append(Conv2DTranspose(3,(4, 4),strides=(2,2), padding=padding,\
                            name=padding+'_no_output_padding',use_bias=False)(X))
model = Model(inputs=I, outputs=Y, name = 'model')
model.summary()

Which gives the following output:

input_1 (InputLayer)            (None, 38, 25, 3)    0                                            
______________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 19, 13, 3)    0           input_1[0][0]                    
______________________________________________________________________________
same00 (Conv2DTranspose)        (None, 36, 24, 3)    144         max_pooling2d[0][0]              
______________________________________________________________________________
same01 (Conv2DTranspose)        (None, 36, 25, 3)    144         max_pooling2d[0][0]              
______________________________________________________________________________
same10 (Conv2DTranspose)        (None, 37, 24, 3)    144         max_pooling2d[0][0]              
______________________________________________________________________________
same11 (Conv2DTranspose)        (None, 37, 25, 3)    144         max_pooling2d[0][0]              
______________________________________________________________________________
same_no_output_padding (Conv2DT (None, 38, 26, 3)    144         max_pooling2d[0][0]              
______________________________________________________________________________
valid00 (Conv2DTranspose)       (None, 40, 28, 3)    144         max_pooling2d[0][0]              
______________________________________________________________________________
valid01 (Conv2DTranspose)       (None, 40, 29, 3)    144         max_pooling2d[0][0]              
______________________________________________________________________________
valid10 (Conv2DTranspose)       (None, 41, 28, 3)    144         max_pooling2d[0][0]              
______________________________________________________________________________
valid11 (Conv2DTranspose)       (None, 41, 29, 3)    144         max_pooling2d[0][0]              
______________________________________________________________________________
valid_no_output_padding (Conv2D (None, 40, 28, 3)    144         max_pooling2d[0][0]

Where name i j means the layer where the padding type is name and the output_padding used was (i,j). For example same01 means padding='same' and output_padding=(0,1). Note there's no combination which outputs the desired shape (38,25,3).

Meanwhile using tf.nn.conv2d_transpose:

x = np.arange(19*13*3).reshape([1,19,13,3]).astype('float32')
w = np.random.normal(size=(4,4,3,3)).astype('float32')
tf_op = tf.nn.conv2d_transpose(x,w,[1,38,25,3], strides=[1,2,2,1], padding='SAME')
with tf.Session() as sess:
    print(sess.run(tf_op).shape)

Ouput: (1, 38, 25, 3)

Maybe there's some step that tf.nn.conv2d_transpose is applying and tf.keras isn't.

I hope someone help me. Thanks for reading.

来源：https://stackoverflow.com/questions/57640507/how-to-get-properly-the-outputs-shape-when-converting-code-from-tf-nn-conv2d-tr

标签

python

tensorflow

machine-learning

keras

conv-neural-network