How to use tf.nn.ctc_loss in cnn+ctc network

↘锁芯ラ 提交于 2019-12-08 03:58:28

问题


Recently, I try to use tensorflow to implement a cnn+ctc network base on the article Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks.

I try to feed batch spectrogram data (shape:(10,120,155,3),batch_size is 10) into 10 convolution layer and 3 fully connected layer. So the output before connecting the ctc layer is 2d data(shape:(10,1024)).

Here is my problem: I want to use tf.nn.ctc_loss function in tensorflow library,but it generate the ValueError: Dimension must be 2 but is 3 for 'transpose'(op:'Transpose') with input shapes:[?,1024],[3].

I guess the error is related to the dimension of my 2d input data. The discription of the ctc_loss function in tensorflow official site is require a 3d input with the shape (batch_size x max_time x num_classes).

So, what is the extra dimension of 'num_classes' ? what should I change the shape of my cnn+fc output data?


回答1:


The fully connected layer should be applied per time step. It's like applying same dense layer per time step in recurrent neural network. For output of convolution layer, time step is width.

So for example, output shape would be:

  1. convolution: (10,120,155,3) = (batch, height, width, channels)
  2. flatten: (10, 155, 120*3) = (batch, max_time, features)
  3. fully connected: (10, 155, 1024), (same dense layer applied per time step)
  4. (10, 155, num_classes)

It is expected shape for ctc_loss in tensorflow.



来源:https://stackoverflow.com/questions/44762631/how-to-use-tf-nn-ctc-loss-in-cnnctc-network

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!