I\'m looking to extract pitches from a sound signal.
Someone on IRC just explained to me how taking a double FFT achieves this. Specifically:
This answer is meant to be read in addition to Jeremy Salwen's post, and also to answer the question regarding literatures.
First of all it's important to consider what is the signal's periodicity. Whether or not the signal is closer to a fully periodic signal for a given analysis window.
Refer here for a detailed explanation for the term and maths https://en.wikipedia.org/wiki/Almost_periodic_function#Quasiperiodic_signals_in_audio_and_music_synthesis
The short answer is that if for a given analysis window a signal is fully periodic, or if the signal is quasi-periodic and the analysis window is small enough that periodicity is achieved then Autocorrelation is enough for the task. Examples of signals that fulfill these conditions are:
Example of signals that fail to fulfill these conditions are:
For pitch detection using autocorrelation there is a tutorial on how it is implemented in Praat:
The paper describes in detail about the use of unbiased autocorrelation (the term as used by Jeremy Salwen) for pitch detection, it also shows that it is superior to biased autocorrelation for pitch detection. Although it notes that the autocorrelation results are only significant up to half of the window size, you don't neet to calculate the latter half.
A biased autocorrelation is done by windowing the signals using a tapering window and then doing the autocorrelation. This reduces the effects of low-frequency modulation (amplitude change at a slow time scale) that is detrimental to the pitch detection, since otherwise parts with larger amplitude will give a larger autocorrelation coefficient that will be preferred.
The algorithm used in Boersma's paper can be described in 5 steps:
It's important to note that the window will go toward zero on both ends, and the autocorrelation of the window will also go towards zero. This is why the latter half of an unbiased autocorrelation is useless, it is a division by zero nearing the end of the window.
Next is YIN: - De Cheveigné, Alain, and Hideki Kawahara. "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111.4 (2002): 1917-1930.
As I understand it the YIN paper also gives evidence that using a taper window has detrimental effects on pitch detection accuracy. And interestingly it prefer to not use any tapering window function (it says something to the effect that tapering window does not bring any improvements to the results and instead complicates it.)
Last is Philip McLeod's SNAC and WSNAC (already linked by Jeremy Salwen):
They can be found on miracle.otago.ac.nz/tartini/papers.html
I haven't read too far into it, but there is a mention of it as a method to reduce the detriment effects of tapering window of biased autocorrelation that is different compared to the method used by Boersma. (note that I haven't come across anything about MPM so I can't say anything about it)
One last suggestion is that if you're making an instrument tuner, the method that would be easier and will have a bit better result compared to autocorrelation is by using cross-correlation with a pure sinusoidal signal with a predetermined frequency.
Jeremy Salwen:
That is, suppose you plotted the function sin(4x)+sin(6x)+sin(8x)+sin(10x). If you look at that, it is clear that it has the same frequency as the function sin(2x). However, if you apply fourier analysis to this function, the bin corresponding to sin(2x) will have zero magnitude. Thus this signal is consider to have a "missing fundamental frequency", because it does not contain the sinusoid of the frequency which we consider it to be.
I would like to argue that although the given signal is periodic at the \omega=2, it is not the same as having the same frequency as the function sin(2x). As fourier analysis will show that the component sin(2x) has zero magnitude. This is related to the point that there is a relation between pitch, frequency and the fundamental frequency of a signal, but they are different and not interchangeable. It is important to remember that pitch is a subjective measurements, that it depends on human as one that perceives it. It looks as though it has the same frequency as sin(2x), that's how we perceive it visually. The same effect also happens similarly on pitch and audio perception. the example that came to mind immediately is Beats, that is the perceived pitch that is heard when there are two sinusoidals with close but different frequencies.