问题
I have a problem of applying masking layer to CNNs in RNN/LSTM model.
My data is not original image, but I converted into a shape of (16, 34, 4)(channels_first). The data is sequential, and the longest step length is 22. So for invariant way, I set the timestep as 22. Since it may be shorter than 22 steps, I fill others with np.zeros. However, for 0 padding data, it's about half among all dataset, so with 0 paddings, the training cannot reach a very good result with so much useless data. Then I want to add a mask to cancel these 0 padding data.
Here is my code.
mask = np.zeros((16,34,4), dtype = np.int8)
input_shape = (22, 16, 34, 4)
model = Sequential()
model.add(TimeDistributed(Masking(mask_value=mask), input_shape=input_shape, name = 'mask'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name = 'conv1'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn1'))
model.add(Dropout(0.5, name = 'drop1'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv2'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn2'))
model.add(Dropout(0.5, name = 'drop2'))
model.add(TimeDistributed(Conv2D(100, (5, 2), data_format = 'channels_first', activation = relu), name ='conv3'))
model.add(TimeDistributed(BatchNormalization(), name = 'bn3'))
model.add(Dropout(0.5, name = 'drop3'))
model.add(TimeDistributed(Flatten(), name = 'flatten'))
model.add(GRU(256, activation='tanh', return_sequences=True, name = 'gru'))
model.add(Dropout(0.4, name = 'drop_gru'))
model.add(Dense(35, activation = 'softmax', name = 'softmax'))
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['acc'])
Here's the model structure.
model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mask (TimeDist (None, 22, 16, 34, 4) 0
_________________________________________________________________
conv1 (TimeDistributed) (None, 22, 100, 30, 3) 16100
_________________________________________________________________
bn1 (TimeDistributed) (None, 22, 100, 30, 3) 12
_________________________________________________________________
drop1 (Dropout) (None, 22, 100, 30, 3) 0
_________________________________________________________________
conv2 (TimeDistributed) (None, 22, 100, 26, 2) 100100
_________________________________________________________________
bn2 (TimeDistributed) (None, 22, 100, 26, 2) 8
_________________________________________________________________
drop2 (Dropout) (None, 22, 100, 26, 2) 0
_________________________________________________________________
conv3 (TimeDistributed) (None, 22, 100, 22, 1) 100100
_________________________________________________________________
bn3 (TimeDistributed) (None, 22, 100, 22, 1) 4
_________________________________________________________________
drop3 (Dropout) (None, 22, 100, 22, 1) 0
_________________________________________________________________
flatten (TimeDistributed) (None, 22, 2200) 0
_________________________________________________________________
gru (GRU) (None, 22, 256) 1886976
_________________________________________________________________
drop_gru (Dropout) (None, 22, 256) 0
_________________________________________________________________
softmax (Dense) (None, 22, 35) 8995
=================================================================
Total params: 2,112,295
Trainable params: 2,112,283
Non-trainable params: 12
_________________________________________________________________
For mask_value, I tried with either 0 or this mask structure, but neither works and it still trains through all the data with half 0 paddings in it.
Can anyone help me?
B.T.W., I used TimeDistributed here to connect RNN, and I know another one called ConvLSTM2D. Does anyone know the difference? ConvLSTM2D takes much more params for the model, and get training much slower than TimeDistributed...
回答1:
Unfortunately masking is not yet supported by the Keras Conv layers. There have been several issues posted about this on the Keras Github page, here is the one with the most substantial conversation on the topic. It appears that there was some hang up implementation details and the issue was never resolved.
The workaround proposed in the discussion is to have an explicit embedding for the padding character in sequences and do global pooling. Here is another workaround I found (not helpful for my use case but maybe helpful to you) - keeping a mask array to merge through multiplication.
You can also check out the conversation around this question which is similar to yours.
来源:https://stackoverflow.com/questions/53977616/how-to-apply-masking-layer-to-sequential-cnn-model-in-keras