I have sequences of long 1_D vectors (3000 digits) that I am trying to classify. I have previously implemented a simple CNN to classify them with relative success:
def create_shallow_model(shape,repeat_length,stride):
model = Sequential()
model.add(Conv1D(75,repeat_length,strides=stride,padding='same', input_shape=shape, activation='relu'))
model.add(MaxPooling1D(repeat_length))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
return model
However I am looking to improve the performance by stacking an LSTM/ RNN on the end of the network.
I am having difficulty with this as I cannot seem to find a way for the network to accept the data.
def cnn_lstm(shape,repeat_length,stride):
model = Sequential()
model.add(TimeDistributed(Conv1D(75,repeat_length,strides=stride,padding='same', activation='relu'),input_shape=(None,)+shape))
model.add(TimeDistributed(MaxPooling1D(repeat_length)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(6,return_sequences=True))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
return model
model=cnn_lstm(X.shape[1:],1000,1)
tprs,aucs=calculate_roc(model,3,100,train_X,train_y,test_X,test_y,tprs,aucs)
But I get the following error:
ValueError: Error when checking input: expected time_distributed_4_input to have 4 dimensions, but got array with shape (50598, 3000, 1)
My questions are:
Is this a correct way of analysing this data?
If so, how do I get the network to accept and classify the input sequences?
There is no need to add those TimeDistributed
wrappers. Currently, before adding the LSTM layer, your model looks like this (I have assumed repeat_length=5
and stride=1
):
Layer (type) Output Shape Param #
=================================================================
conv1d_2 (Conv1D) (None, 3000, 75) 450
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 600, 75) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 45000) 0
_________________________________________________________________
dense_4 (Dense) (None, 1) 45001
=================================================================
Total params: 45,451
Trainable params: 45,451
Non-trainable params: 0
_________________________________________________________________
So if you want to add a LSTM layer, you can put it right after the MaxPooling1D
layer like model.add(LSTM(16, activation='relu'))
and just remove the Flatten
layer. Now the model looks like this:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_4 (Conv1D) (None, 3000, 75) 450
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 600, 75) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 16) 5888
_________________________________________________________________
dense_5 (Dense) (None, 1) 17
=================================================================
Total params: 6,355
Trainable params: 6,355
Non-trainable params: 0
_________________________________________________________________
If you want you can pass the return_sequences=True
argument to the LSTM
layer and keep the Flatten
layer. But only do such a thing after you have tried the first approach and you have gotten poor results, since adding return_sequences=True
may not be necessary at all and it only increases your model size and decreases model performance.
As a side note: why did you change the loss function to sparse_categorical_crossentropy
in the second model? There is no need to do that since binary_crossentropy
would work fine.
来源:https://stackoverflow.com/questions/51992336/keras-cnn-lstm-input-layer-not-accepting-1-d-input