Accelerate framework vDSP, FFT framing

问题

I'm trying to implement FFT calculation, using Apple's vDSP, on a recorded audio file (let's assume it's a mono PCM).

I've did a research here and I've found following topics quite useful:

Using the apple FFT and accelerate Framework
Extracting precise frequencies from FFT Bins using phase change between frames
Reading audio with Extended Audio File Services (ExtAudioFileRead)

For example, we configured FFT with frame_size N = 1024 samples, log2n=10:

m_setupReal = vDSP_create_fftsetup(LOG_2N, FFT_RADIX2);

// allocate space for a hamming window
m_hammingWindow = (float *) malloc(sizeof(float) * N);

// generate the window values and store them in the hamming window buffer
vDSP_hamm_window(m_hammingWindow, N, vDSP_HANN_NORM);

somewhere in the code:

vDSP_vmul(dataFrame, 1, m_hammingWindow, 1, dataFrame, 1, N);

vDSP_ctoz((COMPLEX *)dataFrame, 2, &(m_splitComplex), 1, nOver2);

// Do real->complex forward FFT
vDSP_fft_zrip(m_setupReal, &(m_splitComplex), 1, LOG_2N, kFFTDirection_Forward);

What I'm missing right now, in my understanding of FFT usage, is how to get complete spectrum of a large audio file, let's assume 12800 samples in total.

Q: Do I need to split raw data into frames with size 1024 samples (~ 12800 / 1024 = 13 frames), then perform FFT on each frame separately and then, somehow, average 13 FFTs results into resulting spectrum? If it's correct assumption, then how to perform averaging?

I'd really appreciate any help.

回答1:

You don't want to average the spectra, unless you have a statistically stationary signal. If it's something time-varying like speech or music, then you effectively have a 3D data set: time versus frequency versus magnitude, which you can plot as a spectrogram or waterfall plot.

Note also that it is common practice to overlap successive windows, to gain more resolution in the time axis, so your first block might be samples 0..1023 and then the second block with 50% overlap would be 512..1535, etc.

回答2:

On the other hand, if your signal is stationary, and mixed with some amount of noise, then vector averaging the magnitude results of multiple FFTs will give you Welch's method, which may improve the signal to noise ratio of the resulting averaged magnitude spectrum.

Also, again if the signal is stationary, then using the differences in phase between the FFT bins of offset windows can be used with the Phase Vocoder algorithm to refine spectral frequency estimates. If the signal is stationary for short intervals of time, then one might want to do this only for windows that fit inside those intervals, perhaps by reducing window offsets (increasing overlaps).

So, it depends on the signal, and what information you want from the FFTs.

来源：https://stackoverflow.com/questions/19832317/accelerate-framework-vdsp-fft-framing

标签

signal-processing

fft

core-audio

accelerate-framework

vdsp