What are the components of the Mel mfcc

百般思念 提交于 2021-01-29 17:08:24

问题


In looking at the output of this line of code:

mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc=40)
print("MFCC Shape = ", mfccs.shape)

I get a response of MFCC Shape = (40,1876). What do these two numbers represent? I looked at the librosa website but still could not decipher what are these two values.

Any insights will be greatly appreciated!


回答1:


The first dimension (40) is the number of MFCC coefficients, and the second dimensions (1876) is the number of time frames. The number of MFCC is specified by n_mfcc, and the number of time frames is given by the length of the audio (in samples) divided by the hop_length.

To understand the meaning of the MFCCs themselves, you should understand the steps it takes to compute them:

  • Spectrograms, using the Short-Time-Fourier-Transform (STFT)
  • The Mel spectrogram, from applying Mel scale filterbanks to the STFT
  • Mel Frequency Cepstral Coefficients, from applying the DCT transform on the mel-spectrogram.

A good written explainer is Haytham Fayek: Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between and a good video explainer is The Sound of AI: Mel-Frequency Cepstral Coefficients Explained Easily.



来源:https://stackoverflow.com/questions/65206575/what-are-the-components-of-the-mel-mfcc

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!