可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to use the following ConvLSTM2D architecture to estimate high resolution image sequences from low resolution ones:

import numpy as np, scipy.ndimage, matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Convolution2D, ConvLSTM2D, MaxPooling2D, UpSampling2D from sklearn.metrics import accuracy_score, confusion_matrix, cohen_kappa_score from sklearn.preprocessing import MinMaxScaler, StandardScaler np.random.seed(123)  raw = np.arange(96).reshape(8,3,4) data1 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=1, mode='nearest') #low res print (data1.shape) #(8, 300, 400)  data2 = scipy.ndimage.zoom(raw, zoom=(1,100,100), order=3, mode='nearest') #high res print (data2.shape) #(8, 300, 400)  X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1) Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1) #(samples,time, rows, cols, channels)  model = Sequential() input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1) #samples, time, rows, cols, channels model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape))      model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))  print (model.summary())  model.compile(loss='mean_squared_error',               optimizer='adam',               metrics=['accuracy'])  model.fit(X_train, Y_train,            batch_size=1, epochs=10, verbose=1)  x,y = model.evaluate(X_train, Y_train, verbose=0) print (x,y)

This declaration will result in the following Value error:

ValueError: Input 0 is incompatible with layer conv_lst_m2d_2: expected ndim=5, found ndim=4

How can I correct this ValueError? I think problem is with input shapes, but could not figure out what exactly is wrong.
Notice that the output should be sequences of images too, instead of a classification result.

回答1:

This is happening because LSTMs require temporal data, but your first one was declared as a many-to-one model, which outputs a tensor of shape (batch_size, 300, 400, 16). That is, batches of images:

model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape))      model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

You want the output to be a tensor of shape (batch_size, 8, 300, 400, 16) (i.e. sequences of images), so they can be consumed by the second LSTM. The way to fix this is to add return_sequences in the first LSTM definition:

model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape,                      return_sequences=True)) model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same'))

You mentioned classification. If what you indent is to classify entire sequences, then you need a classifier at the end:

model.add(ConvLSTM2D(16, kernel_size=(3,3), activation='sigmoid',padding='same',input_shape=input_shape,                      return_sequences=True)) model.add(ConvLSTM2D(8, kernel_size=(3,3), activation='sigmoid',padding='same')) model.add(GlobalAveragePooling2D()) model.add(Dense(10, activation='softmax'))  # output shape: (None, 10)

But if you are trying to classify each image within the sequences, then you can simply reapply the classifier using TimeDistributed:

x = Input(shape=(300, 400, 8)) y = GlobalAveragePooling2D()(x) y = Dense(10, activation='softmax')(y) classifier = Model(inputs=x, outputs=y)  x = Input(shape=(data1.shape[0], data1.shape[1], data1.shape[2], 1)) y = ConvLSTM2D(16, kernel_size=(3, 3),                activation='sigmoid',                padding='same',                return_sequences=True)(x) y = ConvLSTM2D(8, kernel_size=(3, 3),                activation='sigmoid',                padding='same',                return_sequences=True)(y) y = TimeDistributed(classifier)(y)  # output shape: (None, 8, 10)  model = Model(inputs=x, outputs=y)

Finally, take a look at the examples in keras repository. There's one for a generative model using ConvLSTM2D.

Edit: to estimate data2 from data1...

If I got it right this time, X_train should be 1 sample of a stack of 8 (300, 400, 1) images, not 8 samples of a stack of 1 image of shape (300, 400, 1).
If that's true, then:

X_train = data1.reshape(data1.shape[0], 1, data1.shape[1], data1.shape[2], 1) Y_train = data2.reshape(data2.shape[0], 1, data2.shape[1], data2.shape[2], 1)

Should be updated to:

X_train = data1.reshape(1, data1.shape[0], data1.shape[1], data1.shape[2], 1) Y_train = data2.reshape(1, data2.shape[0], data2.shape[1], data2.shape[2], 1)

Also, accuracy doesn't usually make sense when your loss is mse. You can use other metrics such as mae.

Now you just need to update your model to return sequences and to have a single unit in the last layer (because the images you are trying to estimate have a single channel):

model = Sequential() input_shape = (data1.shape[0], data1.shape[1], data1.shape[2], 1) model.add(ConvLSTM2D(16, kernel_size=(3, 3), activation='sigmoid', padding='same',                      input_shape=input_shape,                      return_sequences=True)) model.add(ConvLSTM2D(1, kernel_size=(3, 3), activation='sigmoid', padding='same',                      return_sequences=True))  model.compile(loss='mse', optimizer='adam')

After that, model.fit(X_train, Y_train, ...) will start training normally:

Using TensorFlow backend. (8, 300, 400) (8, 300, 400) Epoch 1/10  1/1 [==============================] - 5s 5s/step - loss: 2993.8701 Epoch 2/10  1/1 [==============================] - 5s 5s/step - loss: 2992.4492 Epoch 3/10  1/1 [==============================] - 5s 5s/step - loss: 2991.4536 Epoch 4/10  1/1 [==============================] - 5s 5s/step - loss: 2989.8523

文章来源: Estimating high resolution images from lower ones using a Keras model based on ConvLSTM2D

标签

sigmoid

padding