I'm trying to use the following ConvLSTM2D
architecture to estimate high resolution image sequences from low resolution ones:
import numpy as np, scipy.ndimage, matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Convolution2D, ConvLSTM2D, MaxPooling2D, UpSampling2D from sklearn.metrics import accuracy_score, confusion_matrix, cohen_kappa_score from sklearn.preprocessing import MinMaxScaler, StandardScaler np.random.seed(123) raw = np.arange(96).reshape(8,3,4) data1 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=1, mode='nearest') #low res print (data1.shape) #(8, 300, 400) data2 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=3, mode='nearest') #high res print (data2.shape) #(8, 300, 400) X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1) Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1) #(samples,time, rows, cols, channels) model = Sequential() input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1) #samples, time, rows, cols, channels model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape)) model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same')) print (model.summary()) model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy']) model.fit(X_train, Y_train, batch_size=1, epochs=10, verbose=1) x,y = model.evaluate(X_train, Y_train, verbose=0) print (x,y)
This declaration will result in the following Value
error:
ValueError: Input 0 is incompatible with layer conv_lst_m2d_2: expected ndim=5, found ndim=4
How can I correct this ValueError
? I think problem is with input shapes, but could not figure out what exactly is wrong.
Notice that the output should be sequences of images too, instead of a classification result.
This is happening because LSTMs
require temporal data, but your first one was declared as a many-to-one
model, which outputs a tensor of shape (batch_size, 300, 400, 16)
. That is, batches of images:
model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape)) model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))
You want the output to be a tensor of shape (batch_size, 8, 300, 400, 16)
(i.e. sequences of images), so they can be consumed by the second LSTM. The way to fix this is to add return_sequences
in the first LSTM definition:
model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape, return_sequences=True)) model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))
You mentioned classification. If what you indent is to classify entire sequences, then you need a classifier at the end:
model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape, return_sequences=True)) model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same')) model.add(GlobalAveragePooling2D()) model.add(Dense(10, activation='softmax')) # output shape: (None, 10)
But if you are trying to classify each image within the sequences, then you can simply reapply the classifier using TimeDistributed
:
x = Input(shape=(300, 400, 8)) y = GlobalAveragePooling2D()(x) y = Dense(10, activation='softmax')(y) classifier = Model(inputs=x, outputs=y) x = Input(shape=(data1.shape[0], data1.shape[1], data1.shape[2], 1)) y = ConvLSTM2D(16, kernel_size=(3, 3), activation='sigmoid', padding='same', return_sequences=True)(x) y = ConvLSTM2D(8, kernel_size=(3, 3), activation='sigmoid', padding='same', return_sequences=True)(y) y = TimeDistributed(classifier)(y) # output shape: (None, 8, 10) model = Model(inputs=x, outputs=y)
Finally, take a look at the examples in keras repository. There's one for a generative model using ConvLSTM2D.
Edit: to estimate data2 from data1...
If I got it right this time, X_train
should be 1 sample of a stack of 8 (300, 400, 1) images, not 8 samples of a stack of 1 image of shape (300, 400, 1).
If that's true, then:
X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1) Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1)
Should be updated to:
X_train = data1.reshape(1, data1.shape[0], data1.shape[1], data1.shape[2], 1) Y_train = data2.reshape(1, data2.shape[0], data2.shape[1], data2.shape[2], 1)
Also, accuracy
doesn't usually make sense when your loss is mse. You can use other metrics such as mae
.
Now you just need to update your model to return sequences and to have a single unit in the last layer (because the images you are trying to estimate have a single channel):
model = Sequential() input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1) model.add(ConvLSTM2D(16, kernel_size=(3, 3), activation='sigmoid', padding='same', input_shape=input_shape, return_sequences=True)) model.add(ConvLSTM2D(1, kernel_size=(3, 3), activation='sigmoid', padding='same', return_sequences=True)) model.compile(loss='mse', optimizer='adam')
After that, model.fit(X_train, Y_train, ...)
will start training normally:
Using TensorFlow backend. (8, 300, 400) (8, 300, 400) Epoch 1/10 1/1 [==============================] - 5s 5s/step - loss: 2993.8701 Epoch 2/10 1/1 [==============================] - 5s 5s/step - loss: 2992.4492 Epoch 3/10 1/1 [==============================] - 5s 5s/step - loss: 2991.4536 Epoch 4/10 1/1 [==============================] - 5s 5s/step - loss: 2989.8523