Building Speech Dataset for LSTM binary classification

血红的双手。 提交于 2019-11-27 16:09:35

There are many existing implementation for example Tensorflow Implementation, Kaldi-focused implementation with all the scripts, it is better to check them first.

Theano is too low-level, you might try with keras instead, as described in tutorial. You can run tutorial "as is" to understand how things goes.

Then, you need to prepare a dataset. You need to turn your data into sequences of data frames and for every data frame in sequence you need to assign an output label.

Keras supports two types of RNNs - layers returning sequences and layers returning simple values. You can experiment with both, in code you just use return_sequences=True or return_sequences=False

To train with sequences you can assign dummy label for all frames except the last one where you can assign the label of the word you want to recognize. You need to place input and output labels to arrays. So it will be:

X = [[word1frame1, word1frame2, ..., word1framen],[word2frame1, word2frame2,...word2framen]]

Y = [[0,0,...,1], [0,0,....,2]]

In X every element is a vector of 13 floats. In Y every element is just a number - 0 for intermediate frames and word ID for final frame.

To train with just labels you need to place input and output labels to arrays and output array is simpler. So the data will be:

X = [[word1frame1, word1frame2, ..., word1framen],[word2frame1, word2frame2,...word2framen]]

Y = [[0,0,1], [0,1,0]]

Note that output is vectorized (np_utils.to_categorical) to turn it to vectors instead of just numbers.

Then you create network architecture. You can have 13 floats for input, a vector for output. In the middle you might have one fully connected layer followed by one lstm layer. Do not use too big layers, start with small ones.

Then you feed this dataset into model.fit and it trains you the model. You can estimate model quality on heldout set after training.

You will have a problem with convergence since you have just 20 examples. You need way more examples, preferably thousands to train LSTM, you will only be able to use very small models.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!