How to classify continuous audio

落花浮王杯 提交于 2019-12-04 07:15:17

First, you need to annotate your events in the sound streams, i.e. specify bounds and labels for them.

Then, convert your sounds into sequences of feature vectors using signal framing. Typical choices are MFCCs or log-mel filtebank features (the latter corresponds to a spectrogram of a sound). Having done this, you will convert your sounds into sequences of fixed-size feature vectors that can be fed into a classifier. See this. for better explanation.

Since typical sounds have a longer duration than an analysis frame, you probably need to stack several contiguous feature vectors using sliding window and use these stacked frames as input to your NN.

Now you have a) input data and b) annotations for each window of analysis. So, you can try to train a DNN or a CNN or a RNN to predict a sound class for each window. This task is known as spotting. I suggest you to read Sainath, T. N., & Parada, C. (2015). Convolutional Neural Networks for Small-footprint Keyword Spotting. In Proceedings INTERSPEECH (pp. 1478–1482) and to follow its references for more details.

You can use a recurrent neural network (RNN).

https://www.tensorflow.org/versions/r0.12/tutorials/recurrent/index.html

The input data is a sequence and you can put a label in every sample of the time series.

For example a LSTM (a kind of RNN) is available in libraries like tensorflow.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!