How do you pass video features from a CNN to an LSTM?

徘徊边缘 提交于 2019-11-30 09:17:55

Basically, you can flatten each frame features and feed them into one LSTM cell. With CNN, it's the same. You can feed each output of CNN into one LSTM cell.

For FC, it's up to you.

See a network structure from http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-180.pdf.

The architecture of cnn+lstm model will look like the below diagram Basically you have to create time distributed wrapper for CNN layer and then pass the output of CNN to the LSTM layer

cnn_input= Input(shape=(3,200,100,1))   #Frames,height,width,channel of imafe
conv1 = TimeDistributed(Conv2D(32, kernel_size=(50,5),    activation='relu'))(cnn_input)
conv2 = TimeDistributed(Conv2D(32, kernel_size=(20,5), activation='relu'))(conv1)
pool1=TimeDistributed(MaxPooling2D(pool_size=(4,4)))(conv2)
flat=TimeDistributed(Flatten())(pool1)
cnn_op= TimeDistributed(Dense(100))(flat)

After this you can pass your CNN output to LSTM

lstm = LSTM(128, return_sequences=True, activation='tanh')(merged)
op =TimeDistributed(Dense(100))(lstm)
fun_model = Model(inputs=[cnn_input], outputs=op)

please remember the input to this time distributed CNN must be (# of frames,row_size,column_size,channels)

And Finally you can apply softmax at the last layer to get some predictions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!