What are the components of the Mel mfcc

问题

In looking at the output of this line of code:

mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc=40)
print("MFCC Shape = ", mfccs.shape)

I get a response of MFCC Shape = (40,1876). What do these two numbers represent? I looked at the librosa website but still could not decipher what are these two values.

Any insights will be greatly appreciated!

回答1:

The first dimension (40) is the number of MFCC coefficients, and the second dimensions (1876) is the number of time frames. The number of MFCC is specified by n_mfcc, and the number of time frames is given by the length of the audio (in samples) divided by the hop_length.

To understand the meaning of the MFCCs themselves, you should understand the steps it takes to compute them:

Spectrograms, using the Short-Time-Fourier-Transform (STFT)
The Mel spectrogram, from applying Mel scale filterbanks to the STFT
Mel Frequency Cepstral Coefficients, from applying the DCT transform on the mel-spectrogram.

A good written explainer is Haytham Fayek: Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between and a good video explainer is The Sound of AI: Mel-Frequency Cepstral Coefficients Explained Easily.

来源：https://stackoverflow.com/questions/65206575/what-are-the-components-of-the-mel-mfcc

标签

librosa

mfcc

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!