extracting pitch features from audio file

笑着哭i 提交于 2019-12-02 23:52:17
Frank Zalkow

You can map frequencies to musical notes:

with

being the midi note number to be calculated,

the frequency and

the chamber pitch (in modern music 440.0 Hz is common).

As you may know a single frequency doesn't make a musical pitch. "Pitch" arises from the sensation of the fundamental of harmonic sounds, i.e. sounds that mainly consist of integer multiples of one single frequency (= the fundamental).

If you want to have Chroma Features in Python, you can use the Bregman Audio-Visual Information Toolbox. Note that chroma features don't give you information about the octave of a pitch, so you just get information about the pitch class.

from bregman.suite import Chromagram
audio_file = "mono_file.wav"
F = Chromagram(audio_file, nfft=16384, wfft=8192, nhop=2205)
F.X # all chroma features
F.X[:,0] # one feature

The general problem of extracting pitch information from audio is called pitch detection.

You can try reading the literature on pitch detection, which is quite extensive. Generally autocorrelation-based methods seem to work pretty well; frequency-domain or zero-crossing methods are less robust (so FFT doesn't really help much). A good starting point may be to implement one of these two algorithms:

As far as off-the-shelf solutions, check out Aubio, C code with python wrapper, several pitch-extraction algorithms available including YIN and multiple-comb.

If you're willing to use 3rd party libraries (at least as a reference for how other people accomplished this):

Extracting musical information from sound, a presentation from PyCon 2012, shows how to use the AudioNest Python API:

Here is the relevant EchoNest documentation:

Relevant excerpt:

pitch content is given by a “chroma” vector, corresponding to the 12 pitch classes C, C#, D to B, with values ranging from 0 to 1 that describe the relative dominance of every pitch in the chromatic scale. For example a C Major chord would likely be represented by large values of C, E and G (i.e. classes 0, 4, and 7). Vectors are normalized to 1 by their strongest dimension, therefore noisy sounds are likely represented by values that are all close to 1, while pure tones are described by one value at 1 (the pitch) and others near 0.

EchoNest does the analysis on their servers. They provide free API keys for non-commercial use.

If EchoNest is not an option, I would look at the open-source aubio project. It has python bindings, and you can examine the source to see how they accomplished pitch detection.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!