Cepstral Analysis for pitch detection

前端未结

关注

 5  1437

陌清茗 2020-12-07 09:53

I\'m looking to extract pitches from a sound signal.

Someone on IRC just explained to me how taking a double FFT achieves this. Specifically:

take FFT

5条回答

慢半拍i (楼主)

2020-12-07 10:09
This answer is meant to be read in addition to Jeremy Salwen's post, and also to answer the question regarding literatures.

First of all it's important to consider what is the signal's periodicity. Whether or not the signal is closer to a fully periodic signal for a given analysis window.

Refer here for a detailed explanation for the term and maths https://en.wikipedia.org/wiki/Almost_periodic_function#Quasiperiodic_signals_in_audio_and_music_synthesis

The short answer is that if for a given analysis window a signal is fully periodic, or if the signal is quasi-periodic and the analysis window is small enough that periodicity is achieved then Autocorrelation is enough for the task. Examples of signals that fulfill these conditions are:
- Pure sinusoidal tone
- String instruments with long sustains and stable pitch (no vibrato), especially true on the sustain part, not so true on the transients.
- Windpipe instruments that are blown long enough.
Example of signals that fail to fulfill these conditions are:
- Percussive sounds
- String or windpipe instruments that are played with each note only held very short, or changing in a short time
- Complex music, or basically combination of multiple instruments that are played with different pitches.
For pitch detection using autocorrelation there is a tutorial on how it is implemented in Praat:
- http://www.pinguinorodriguez.cl/blog/pitch-in-praat/ Pitch in Praat A brief explanation of Praat's pitch detection algorithm. This describes the algorithm named 'ac'.
- www.fon.hum.uva.nl/paul/praat.html Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Paul Boersma. IFA Proceedings 17: 97-110.
The paper describes in detail about the use of unbiased autocorrelation (the term as used by Jeremy Salwen) for pitch detection, it also shows that it is superior to biased autocorrelation for pitch detection. Although it notes that the autocorrelation results are only significant up to half of the window size, you don't neet to calculate the latter half.

A biased autocorrelation is done by windowing the signals using a tapering window and then doing the autocorrelation. This reduces the effects of low-frequency modulation (amplitude change at a slow time scale) that is detrimental to the pitch detection, since otherwise parts with larger amplitude will give a larger autocorrelation coefficient that will be preferred.

The algorithm used in Boersma's paper can be described in 5 steps:
1. Remove DC from the signal that is going to be windowed (x - x_avg)
2. Window the signal using a taper function (He argues that Hann window, or better, Gaussian window is used for it)
3. Autocorrelates the signal
4. Divide the autocorrelation function with the autocorrelation of the window used.
5. Peak-picking (similar to previous algorithms)
It's important to note that the window will go toward zero on both ends, and the autocorrelation of the window will also go towards zero. This is why the latter half of an unbiased autocorrelation is useless, it is a division by zero nearing the end of the window.

Next is YIN: - De Cheveigné, Alain, and Hideki Kawahara. "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111.4 (2002): 1917-1930.

As I understand it the YIN paper also gives evidence that using a taper window has detrimental effects on pitch detection accuracy. And interestingly it prefer to not use any tapering window function (it says something to the effect that tapering window does not bring any improvements to the results and instead complicates it.)

Last is Philip McLeod's SNAC and WSNAC (already linked by Jeremy Salwen):
- Philip McLeod, Fast, Accurate Pitch Detection Tools for Music Analysis, PhD thesis, Department of Computer Science, University of Otago, 2008.
- McLeod. P, Wyvill. G, "A Smarter Way to Find Pitch", Proc. International Computer Music Conference, Barcelona, Spain, September 5-9, 2005, pp 138-141.
- McLeod. P, Wyvill. G, "Visualization of Musical Pitch", Proc. Computer Graphics International, Tokyo, Japan, July 9-11, 2003, pp 300-303.
They can be found on miracle.otago.ac.nz/tartini/papers.html

I haven't read too far into it, but there is a mention of it as a method to reduce the detriment effects of tapering window of biased autocorrelation that is different compared to the method used by Boersma. (note that I haven't come across anything about MPM so I can't say anything about it)

One last suggestion is that if you're making an instrument tuner, the method that would be easier and will have a bit better result compared to autocorrelation is by using cross-correlation with a pure sinusoidal signal with a predetermined frequency.

Jeremy Salwen:

That is, suppose you plotted the function sin(4x)+sin(6x)+sin(8x)+sin(10x). If you look at that, it is clear that it has the same frequency as the function sin(2x). However, if you apply fourier analysis to this function, the bin corresponding to sin(2x) will have zero magnitude. Thus this signal is consider to have a "missing fundamental frequency", because it does not contain the sinusoid of the frequency which we consider it to be.

I would like to argue that although the given signal is periodic at the \omega=2, it is not the same as having the same frequency as the function sin(2x). As fourier analysis will show that the component sin(2x) has zero magnitude. This is related to the point that there is a relation between pitch, frequency and the fundamental frequency of a signal, but they are different and not interchangeable. It is important to remember that pitch is a subjective measurements, that it depends on human as one that perceives it. It looks as though it has the same frequency as sin(2x), that's how we perceive it visually. The same effect also happens similarly on pitch and audio perception. the example that came to mind immediately is Beats, that is the perceived pitch that is heard when there are two sinusoidals with close but different frequencies.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...